commit d3f949337e12bb0a7ed65fbb4f20e7a967eb3eab Author: kuailexs <952415538@qq.com> Date: Sun Jul 7 09:37:36 2024 +0800 orphan Change-Id: I7175f9e7e1655ec3d5fc37a1601f4749884db975 diff --git a/COPYING b/COPYING new file mode 100644 index 0000000..623b625 --- /dev/null +++ b/COPYING @@ -0,0 +1,340 @@ + GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc. + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Library General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + + 2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + + 4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + + 5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + + 6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + + 7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + + 10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + + NO WARRANTY + + 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + + 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + + Copyright (C) + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + + +Also add information on how to contact you by electronic and paper mail. + +If the program is interactive, make it output a short notice like this +when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) year name of author + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, the commands you use may +be called something other than `show w' and `show c'; they could even be +mouse-clicks or menu items--whatever suits your program. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the program, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program + `Gnomovision' (which makes passes at compilers) written by James Hacker. + + , 1 April 1989 + Ty Coon, President of Vice + +This General Public License does not permit incorporating your program into +proprietary programs. If your program is a subroutine library, you may +consider it more useful to permit linking proprietary applications with the +library. If this is what you want to do, use the GNU Library General +Public License instead of this License. diff --git a/COPYING.LIB b/COPYING.LIB new file mode 100644 index 0000000..2d2d780 --- /dev/null +++ b/COPYING.LIB @@ -0,0 +1,510 @@ + + GNU LESSER GENERAL PUBLIC LICENSE + Version 2.1, February 1999 + + Copyright (C) 1991, 1999 Free Software Foundation, Inc. + 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + +[This is the first released version of the Lesser GPL. It also counts + as the successor of the GNU Library Public License, version 2, hence + the version number 2.1.] + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +Licenses are intended to guarantee your freedom to share and change +free software--to make sure the software is free for all its users. + + This license, the Lesser General Public License, applies to some +specially designated software packages--typically libraries--of the +Free Software Foundation and other authors who decide to use it. You +can use it too, but we suggest you first think carefully about whether +this license or the ordinary General Public License is the better +strategy to use in any particular case, based on the explanations +below. + + When we speak of free software, we are referring to freedom of use, +not price. Our General Public Licenses are designed to make sure that +you have the freedom to distribute copies of free software (and charge +for this service if you wish); that you receive source code or can get +it if you want it; that you can change the software and use pieces of +it in new free programs; and that you are informed that you can do +these things. + + To protect your rights, we need to make restrictions that forbid +distributors to deny you these rights or to ask you to surrender these +rights. These restrictions translate to certain responsibilities for +you if you distribute copies of the library or if you modify it. + + For example, if you distribute copies of the library, whether gratis +or for a fee, you must give the recipients all the rights that we gave +you. You must make sure that they, too, receive or can get the source +code. If you link other code with the library, you must provide +complete object files to the recipients, so that they can relink them +with the library after making changes to the library and recompiling +it. And you must show them these terms so they know their rights. + + We protect your rights with a two-step method: (1) we copyright the +library, and (2) we offer you this license, which gives you legal +permission to copy, distribute and/or modify the library. + + To protect each distributor, we want to make it very clear that +there is no warranty for the free library. Also, if the library is +modified by someone else and passed on, the recipients should know +that what they have is not the original version, so that the original +author's reputation will not be affected by problems that might be +introduced by others. + + Finally, software patents pose a constant threat to the existence of +any free program. We wish to make sure that a company cannot +effectively restrict the users of a free program by obtaining a +restrictive license from a patent holder. Therefore, we insist that +any patent license obtained for a version of the library must be +consistent with the full freedom of use specified in this license. + + Most GNU software, including some libraries, is covered by the +ordinary GNU General Public License. This license, the GNU Lesser +General Public License, applies to certain designated libraries, and +is quite different from the ordinary General Public License. We use +this license for certain libraries in order to permit linking those +libraries into non-free programs. + + When a program is linked with a library, whether statically or using +a shared library, the combination of the two is legally speaking a +combined work, a derivative of the original library. The ordinary +General Public License therefore permits such linking only if the +entire combination fits its criteria of freedom. The Lesser General +Public License permits more lax criteria for linking other code with +the library. + + We call this license the "Lesser" General Public License because it +does Less to protect the user's freedom than the ordinary General +Public License. It also provides other free software developers Less +of an advantage over competing non-free programs. These disadvantages +are the reason we use the ordinary General Public License for many +libraries. However, the Lesser license provides advantages in certain +special circumstances. + + For example, on rare occasions, there may be a special need to +encourage the widest possible use of a certain library, so that it +becomes a de-facto standard. To achieve this, non-free programs must +be allowed to use the library. A more frequent case is that a free +library does the same job as widely used non-free libraries. In this +case, there is little to gain by limiting the free library to free +software only, so we use the Lesser General Public License. + + In other cases, permission to use a particular library in non-free +programs enables a greater number of people to use a large body of +free software. For example, permission to use the GNU C Library in +non-free programs enables many more people to use the whole GNU +operating system, as well as its variant, the GNU/Linux operating +system. + + Although the Lesser General Public License is Less protective of the +users' freedom, it does ensure that the user of a program that is +linked with the Library has the freedom and the wherewithal to run +that program using a modified version of the Library. + + The precise terms and conditions for copying, distribution and +modification follow. Pay close attention to the difference between a +"work based on the library" and a "work that uses the library". The +former contains code derived from the library, whereas the latter must +be combined with the library in order to run. + + GNU LESSER GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License Agreement applies to any software library or other +program which contains a notice placed by the copyright holder or +other authorized party saying it may be distributed under the terms of +this Lesser General Public License (also called "this License"). +Each licensee is addressed as "you". + + A "library" means a collection of software functions and/or data +prepared so as to be conveniently linked with application programs +(which use some of those functions and data) to form executables. + + The "Library", below, refers to any such software library or work +which has been distributed under these terms. A "work based on the +Library" means either the Library or any derivative work under +copyright law: that is to say, a work containing the Library or a +portion of it, either verbatim or with modifications and/or translated +straightforwardly into another language. (Hereinafter, translation is +included without limitation in the term "modification".) + + "Source code" for a work means the preferred form of the work for +making modifications to it. For a library, complete source code means +all the source code for all modules it contains, plus any associated +interface definition files, plus the scripts used to control +compilation and installation of the library. + + Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running a program using the Library is not restricted, and output from +such a program is covered only if its contents constitute a work based +on the Library (independent of the use of the Library in a tool for +writing it). Whether that is true depends on what the Library does +and what the program that uses the Library does. + + 1. You may copy and distribute verbatim copies of the Library's +complete source code as you receive it, in any medium, provided that +you conspicuously and appropriately publish on each copy an +appropriate copyright notice and disclaimer of warranty; keep intact +all the notices that refer to this License and to the absence of any +warranty; and distribute a copy of this License along with the +Library. + + You may charge a fee for the physical act of transferring a copy, +and you may at your option offer warranty protection in exchange for a +fee. + + 2. You may modify your copy or copies of the Library or any portion +of it, thus forming a work based on the Library, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) The modified work must itself be a software library. + + b) You must cause the files modified to carry prominent notices + stating that you changed the files and the date of any change. + + c) You must cause the whole of the work to be licensed at no + charge to all third parties under the terms of this License. + + d) If a facility in the modified Library refers to a function or a + table of data to be supplied by an application program that uses + the facility, other than as an argument passed when the facility + is invoked, then you must make a good faith effort to ensure that, + in the event an application does not supply such function or + table, the facility still operates, and performs whatever part of + its purpose remains meaningful. + + (For example, a function in a library to compute square roots has + a purpose that is entirely well-defined independent of the + application. Therefore, Subsection 2d requires that any + application-supplied function or table used by this function must + be optional: if the application does not supply it, the square + root function must still compute square roots.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Library, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Library, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote +it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Library. + +In addition, mere aggregation of another work not based on the Library +with the Library (or with a work based on the Library) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may opt to apply the terms of the ordinary GNU General Public +License instead of this License to a given copy of the Library. To do +this, you must alter all the notices that refer to this License, so +that they refer to the ordinary GNU General Public License, version 2, +instead of to this License. (If a newer version than version 2 of the +ordinary GNU General Public License has appeared, then you can specify +that version instead if you wish.) Do not make any other change in +these notices. + + Once this change is made in a given copy, it is irreversible for +that copy, so the ordinary GNU General Public License applies to all +subsequent copies and derivative works made from that copy. + + This option is useful when you wish to copy part of the code of +the Library into a program that is not a library. + + 4. You may copy and distribute the Library (or a portion or +derivative of it, under Section 2) in object code or executable form +under the terms of Sections 1 and 2 above provided that you accompany +it with the complete corresponding machine-readable source code, which +must be distributed under the terms of Sections 1 and 2 above on a +medium customarily used for software interchange. + + If distribution of object code is made by offering access to copy +from a designated place, then offering equivalent access to copy the +source code from the same place satisfies the requirement to +distribute the source code, even though third parties are not +compelled to copy the source along with the object code. + + 5. A program that contains no derivative of any portion of the +Library, but is designed to work with the Library by being compiled or +linked with it, is called a "work that uses the Library". Such a +work, in isolation, is not a derivative work of the Library, and +therefore falls outside the scope of this License. + + However, linking a "work that uses the Library" with the Library +creates an executable that is a derivative of the Library (because it +contains portions of the Library), rather than a "work that uses the +library". The executable is therefore covered by this License. +Section 6 states terms for distribution of such executables. + + When a "work that uses the Library" uses material from a header file +that is part of the Library, the object code for the work may be a +derivative work of the Library even though the source code is not. +Whether this is true is especially significant if the work can be +linked without the Library, or if the work is itself a library. The +threshold for this to be true is not precisely defined by law. + + If such an object file uses only numerical parameters, data +structure layouts and accessors, and small macros and small inline +functions (ten lines or less in length), then the use of the object +file is unrestricted, regardless of whether it is legally a derivative +work. (Executables containing this object code plus portions of the +Library will still fall under Section 6.) + + Otherwise, if the work is a derivative of the Library, you may +distribute the object code for the work under the terms of Section 6. +Any executables containing that work also fall under Section 6, +whether or not they are linked directly with the Library itself. + + 6. As an exception to the Sections above, you may also combine or +link a "work that uses the Library" with the Library to produce a +work containing portions of the Library, and distribute that work +under terms of your choice, provided that the terms permit +modification of the work for the customer's own use and reverse +engineering for debugging such modifications. + + You must give prominent notice with each copy of the work that the +Library is used in it and that the Library and its use are covered by +this License. You must supply a copy of this License. If the work +during execution displays copyright notices, you must include the +copyright notice for the Library among them, as well as a reference +directing the user to the copy of this License. Also, you must do one +of these things: + + a) Accompany the work with the complete corresponding + machine-readable source code for the Library including whatever + changes were used in the work (which must be distributed under + Sections 1 and 2 above); and, if the work is an executable linked + with the Library, with the complete machine-readable "work that + uses the Library", as object code and/or source code, so that the + user can modify the Library and then relink to produce a modified + executable containing the modified Library. (It is understood + that the user who changes the contents of definitions files in the + Library will not necessarily be able to recompile the application + to use the modified definitions.) + + b) Use a suitable shared library mechanism for linking with the + Library. A suitable mechanism is one that (1) uses at run time a + copy of the library already present on the user's computer system, + rather than copying library functions into the executable, and (2) + will operate properly with a modified version of the library, if + the user installs one, as long as the modified version is + interface-compatible with the version that the work was made with. + + c) Accompany the work with a written offer, valid for at least + three years, to give the same user the materials specified in + Subsection 6a, above, for a charge no more than the cost of + performing this distribution. + + d) If distribution of the work is made by offering access to copy + from a designated place, offer equivalent access to copy the above + specified materials from the same place. + + e) Verify that the user has already received a copy of these + materials or that you have already sent this user a copy. + + For an executable, the required form of the "work that uses the +Library" must include any data and utility programs needed for +reproducing the executable from it. However, as a special exception, +the materials to be distributed need not include anything that is +normally distributed (in either source or binary form) with the major +components (compiler, kernel, and so on) of the operating system on +which the executable runs, unless that component itself accompanies +the executable. + + It may happen that this requirement contradicts the license +restrictions of other proprietary libraries that do not normally +accompany the operating system. Such a contradiction means you cannot +use both them and the Library together in an executable that you +distribute. + + 7. You may place library facilities that are a work based on the +Library side-by-side in a single library together with other library +facilities not covered by this License, and distribute such a combined +library, provided that the separate distribution of the work based on +the Library and of the other library facilities is otherwise +permitted, and provided that you do these two things: + + a) Accompany the combined library with a copy of the same work + based on the Library, uncombined with any other library + facilities. This must be distributed under the terms of the + Sections above. + + b) Give prominent notice with the combined library of the fact + that part of it is a work based on the Library, and explaining + where to find the accompanying uncombined form of the same work. + + 8. You may not copy, modify, sublicense, link with, or distribute +the Library except as expressly provided under this License. Any +attempt otherwise to copy, modify, sublicense, link with, or +distribute the Library is void, and will automatically terminate your +rights under this License. However, parties who have received copies, +or rights, from you under this License will not have their licenses +terminated so long as such parties remain in full compliance. + + 9. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Library or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Library (or any work based on the +Library), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Library or works based on it. + + 10. Each time you redistribute the Library (or any work based on the +Library), the recipient automatically receives a license from the +original licensor to copy, distribute, link with or modify the Library +subject to these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties with +this License. + + 11. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Library at all. For example, if a patent +license would not permit royalty-free redistribution of the Library by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Library. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply, and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 12. If the distribution and/or use of the Library is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Library under this License +may add an explicit geographical distribution limitation excluding those +countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 13. The Free Software Foundation may publish revised and/or new +versions of the Lesser General Public License from time to time. +Such new versions will be similar in spirit to the present version, +but may differ in detail to address new problems or concerns. + +Each version is given a distinguishing version number. If the Library +specifies a version number of this License which applies to it and +"any later version", you have the option of following the terms and +conditions either of that version or of any later version published by +the Free Software Foundation. If the Library does not specify a +license version number, you may choose any version ever published by +the Free Software Foundation. + + 14. If you wish to incorporate parts of the Library into other free +programs whose distribution conditions are incompatible with these, +write to the author to ask for permission. For software which is +copyrighted by the Free Software Foundation, write to the Free +Software Foundation; we sometimes make exceptions for this. Our +decision will be guided by the two goals of preserving the free status +of all derivatives of our free software and of promoting the sharing +and reuse of software generally. + + NO WARRANTY + + 15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO +WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. +EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR +OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY +KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE +LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME +THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + + 16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN +WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY +AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU +FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR +CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE +LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING +RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A +FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF +SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH +DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Libraries + + If you develop a new library, and you want it to be of the greatest +possible use to the public, we recommend making it free software that +everyone can redistribute and change. You can do so by permitting +redistribution under these terms (or, alternatively, under the terms +of the ordinary General Public License). + + To apply these terms, attach the following notices to the library. +It is safest to attach them to the start of each source file to most +effectively convey the exclusion of warranty; and each file should +have at least the "copyright" line and a pointer to where the full +notice is found. + + + + Copyright (C) + + This library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + This library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with this library; if not, write to the Free Software + Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA + +Also add information on how to contact you by electronic and paper mail. + +You should also get your employer (if you work as a programmer) or +your school, if any, to sign a "copyright disclaimer" for the library, +if necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the + library `Frob' (a library for tweaking knobs) written by James + Random Hacker. + + , 1 April 1990 + Ty Coon, President of Vice + +That's all there is to it! + + diff --git a/COPYING.RUNTIME b/COPYING.RUNTIME new file mode 100644 index 0000000..e1b3c69 --- /dev/null +++ b/COPYING.RUNTIME @@ -0,0 +1,73 @@ +GCC RUNTIME LIBRARY EXCEPTION + +Version 3.1, 31 March 2009 + +Copyright (C) 2009 Free Software Foundation, Inc. + +Everyone is permitted to copy and distribute verbatim copies of this +license document, but changing it is not allowed. + +This GCC Runtime Library Exception ("Exception") is an additional +permission under section 7 of the GNU General Public License, version +3 ("GPLv3"). It applies to a given file (the "Runtime Library") that +bears a notice placed by the copyright holder of the file stating that +the file is governed by GPLv3 along with this Exception. + +When you use GCC to compile a program, GCC may combine portions of +certain GCC header files and runtime libraries with the compiled +program. The purpose of this Exception is to allow compilation of +non-GPL (including proprietary) programs to use, in this way, the +header files and runtime libraries covered by this Exception. + +0. Definitions. + +A file is an "Independent Module" if it either requires the Runtime +Library for execution after a Compilation Process, or makes use of an +interface provided by the Runtime Library, but is not otherwise based +on the Runtime Library. + +"GCC" means a version of the GNU Compiler Collection, with or without +modifications, governed by version 3 (or a specified later version) of +the GNU General Public License (GPL) with the option of using any +subsequent versions published by the FSF. + +"GPL-compatible Software" is software whose conditions of propagation, +modification and use would permit combination with GCC in accord with +the license of GCC. + +"Target Code" refers to output from any compiler for a real or virtual +target processor architecture, in executable form or suitable for +input to an assembler, loader, linker and/or execution +phase. Notwithstanding that, Target Code does not include data in any +format that is used as a compiler intermediate representation, or used +for producing a compiler intermediate representation. + +The "Compilation Process" transforms code entirely represented in +non-intermediate languages designed for human-written code, and/or in +Java Virtual Machine byte code, into Target Code. Thus, for example, +use of source code generators and preprocessors need not be considered +part of the Compilation Process, since the Compilation Process can be +understood as starting with the output of the generators or +preprocessors. + +A Compilation Process is "Eligible" if it is done using GCC, alone or +with other GPL-compatible software, or if it is done without using any +work based on GCC. For example, using non-GPL-compatible Software to +optimize any GCC intermediate representations would not qualify as an +Eligible Compilation Process. + +1. Grant of Additional Permission. + +You have permission to propagate a work of Target Code formed by +combining the Runtime Library with Independent Modules, even if such +propagation would otherwise violate the terms of GPLv3, provided that +all Target Code was generated by Eligible Compilation Processes. You +may then convey such a combination under terms of your choice, +consistent with the licensing of the Independent Modules. + +2. No Weakening of GCC Copyleft. + +The availability of this Exception does not imply any general +presumption that third-party software is unaffected by the copyleft +requirements of the license of GCC. + diff --git a/COPYING3 b/COPYING3 new file mode 100644 index 0000000..94a9ed0 --- /dev/null +++ b/COPYING3 @@ -0,0 +1,674 @@ + GNU GENERAL PUBLIC LICENSE + Version 3, 29 June 2007 + + Copyright (C) 2007 Free Software Foundation, Inc. + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The GNU General Public License is a free, copyleft license for +software and other kinds of works. + + The licenses for most software and other practical works are designed +to take away your freedom to share and change the works. By contrast, +the GNU General Public License is intended to guarantee your freedom to +share and change all versions of a program--to make sure it remains free +software for all its users. We, the Free Software Foundation, use the +GNU General Public License for most of our software; it applies also to +any other work released this way by its authors. You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +them if you wish), that you receive source code or can get it if you +want it, that you can change the software or use pieces of it in new +free programs, and that you know you can do these things. + + To protect your rights, we need to prevent others from denying you +these rights or asking you to surrender the rights. Therefore, you have +certain responsibilities if you distribute copies of the software, or if +you modify it: responsibilities to respect the freedom of others. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must pass on to the recipients the same +freedoms that you received. You must make sure that they, too, receive +or can get the source code. And you must show them these terms so they +know their rights. + + Developers that use the GNU GPL protect your rights with two steps: +(1) assert copyright on the software, and (2) offer you this License +giving you legal permission to copy, distribute and/or modify it. + + For the developers' and authors' protection, the GPL clearly explains +that there is no warranty for this free software. For both users' and +authors' sake, the GPL requires that modified versions be marked as +changed, so that their problems will not be attributed erroneously to +authors of previous versions. + + Some devices are designed to deny users access to install or run +modified versions of the software inside them, although the manufacturer +can do so. This is fundamentally incompatible with the aim of +protecting users' freedom to change the software. The systematic +pattern of such abuse occurs in the area of products for individuals to +use, which is precisely where it is most unacceptable. Therefore, we +have designed this version of the GPL to prohibit the practice for those +products. If such problems arise substantially in other domains, we +stand ready to extend this provision to those domains in future versions +of the GPL, as needed to protect the freedom of users. + + Finally, every program is threatened constantly by software patents. +States should not allow patents to restrict development and use of +software on general-purpose computers, but in those that do, we wish to +avoid the special danger that patents applied to a free program could +make it effectively proprietary. To prevent this, the GPL assures that +patents cannot be used to render the program non-free. + + The precise terms and conditions for copying, distribution and +modification follow. + + TERMS AND CONDITIONS + + 0. Definitions. + + "This License" refers to version 3 of the GNU General Public License. + + "Copyright" also means copyright-like laws that apply to other kinds of +works, such as semiconductor masks. + + "The Program" refers to any copyrightable work licensed under this +License. Each licensee is addressed as "you". "Licensees" and +"recipients" may be individuals or organizations. + + To "modify" a work means to copy from or adapt all or part of the work +in a fashion requiring copyright permission, other than the making of an +exact copy. The resulting work is called a "modified version" of the +earlier work or a work "based on" the earlier work. + + A "covered work" means either the unmodified Program or a work based +on the Program. + + To "propagate" a work means to do anything with it that, without +permission, would make you directly or secondarily liable for +infringement under applicable copyright law, except executing it on a +computer or modifying a private copy. Propagation includes copying, +distribution (with or without modification), making available to the +public, and in some countries other activities as well. + + To "convey" a work means any kind of propagation that enables other +parties to make or receive copies. Mere interaction with a user through +a computer network, with no transfer of a copy, is not conveying. + + An interactive user interface displays "Appropriate Legal Notices" +to the extent that it includes a convenient and prominently visible +feature that (1) displays an appropriate copyright notice, and (2) +tells the user that there is no warranty for the work (except to the +extent that warranties are provided), that licensees may convey the +work under this License, and how to view a copy of this License. If +the interface presents a list of user commands or options, such as a +menu, a prominent item in the list meets this criterion. + + 1. Source Code. + + The "source code" for a work means the preferred form of the work +for making modifications to it. "Object code" means any non-source +form of a work. + + A "Standard Interface" means an interface that either is an official +standard defined by a recognized standards body, or, in the case of +interfaces specified for a particular programming language, one that +is widely used among developers working in that language. + + The "System Libraries" of an executable work include anything, other +than the work as a whole, that (a) is included in the normal form of +packaging a Major Component, but which is not part of that Major +Component, and (b) serves only to enable use of the work with that +Major Component, or to implement a Standard Interface for which an +implementation is available to the public in source code form. A +"Major Component", in this context, means a major essential component +(kernel, window system, and so on) of the specific operating system +(if any) on which the executable work runs, or a compiler used to +produce the work, or an object code interpreter used to run it. + + The "Corresponding Source" for a work in object code form means all +the source code needed to generate, install, and (for an executable +work) run the object code and to modify the work, including scripts to +control those activities. However, it does not include the work's +System Libraries, or general-purpose tools or generally available free +programs which are used unmodified in performing those activities but +which are not part of the work. For example, Corresponding Source +includes interface definition files associated with source files for +the work, and the source code for shared libraries and dynamically +linked subprograms that the work is specifically designed to require, +such as by intimate data communication or control flow between those +subprograms and other parts of the work. + + The Corresponding Source need not include anything that users +can regenerate automatically from other parts of the Corresponding +Source. + + The Corresponding Source for a work in source code form is that +same work. + + 2. Basic Permissions. + + All rights granted under this License are granted for the term of +copyright on the Program, and are irrevocable provided the stated +conditions are met. This License explicitly affirms your unlimited +permission to run the unmodified Program. The output from running a +covered work is covered by this License only if the output, given its +content, constitutes a covered work. This License acknowledges your +rights of fair use or other equivalent, as provided by copyright law. + + You may make, run and propagate covered works that you do not +convey, without conditions so long as your license otherwise remains +in force. You may convey covered works to others for the sole purpose +of having them make modifications exclusively for you, or provide you +with facilities for running those works, provided that you comply with +the terms of this License in conveying all material for which you do +not control copyright. Those thus making or running the covered works +for you must do so exclusively on your behalf, under your direction +and control, on terms that prohibit them from making any copies of +your copyrighted material outside their relationship with you. + + Conveying under any other circumstances is permitted solely under +the conditions stated below. Sublicensing is not allowed; section 10 +makes it unnecessary. + + 3. Protecting Users' Legal Rights From Anti-Circumvention Law. + + No covered work shall be deemed part of an effective technological +measure under any applicable law fulfilling obligations under article +11 of the WIPO copyright treaty adopted on 20 December 1996, or +similar laws prohibiting or restricting circumvention of such +measures. + + When you convey a covered work, you waive any legal power to forbid +circumvention of technological measures to the extent such circumvention +is effected by exercising rights under this License with respect to +the covered work, and you disclaim any intention to limit operation or +modification of the work as a means of enforcing, against the work's +users, your or third parties' legal rights to forbid circumvention of +technological measures. + + 4. Conveying Verbatim Copies. + + You may convey verbatim copies of the Program's source code as you +receive it, in any medium, provided that you conspicuously and +appropriately publish on each copy an appropriate copyright notice; +keep intact all notices stating that this License and any +non-permissive terms added in accord with section 7 apply to the code; +keep intact all notices of the absence of any warranty; and give all +recipients a copy of this License along with the Program. + + You may charge any price or no price for each copy that you convey, +and you may offer support or warranty protection for a fee. + + 5. Conveying Modified Source Versions. + + You may convey a work based on the Program, or the modifications to +produce it from the Program, in the form of source code under the +terms of section 4, provided that you also meet all of these conditions: + + a) The work must carry prominent notices stating that you modified + it, and giving a relevant date. + + b) The work must carry prominent notices stating that it is + released under this License and any conditions added under section + 7. This requirement modifies the requirement in section 4 to + "keep intact all notices". + + c) You must license the entire work, as a whole, under this + License to anyone who comes into possession of a copy. This + License will therefore apply, along with any applicable section 7 + additional terms, to the whole of the work, and all its parts, + regardless of how they are packaged. This License gives no + permission to license the work in any other way, but it does not + invalidate such permission if you have separately received it. + + d) If the work has interactive user interfaces, each must display + Appropriate Legal Notices; however, if the Program has interactive + interfaces that do not display Appropriate Legal Notices, your + work need not make them do so. + + A compilation of a covered work with other separate and independent +works, which are not by their nature extensions of the covered work, +and which are not combined with it such as to form a larger program, +in or on a volume of a storage or distribution medium, is called an +"aggregate" if the compilation and its resulting copyright are not +used to limit the access or legal rights of the compilation's users +beyond what the individual works permit. Inclusion of a covered work +in an aggregate does not cause this License to apply to the other +parts of the aggregate. + + 6. Conveying Non-Source Forms. + + You may convey a covered work in object code form under the terms +of sections 4 and 5, provided that you also convey the +machine-readable Corresponding Source under the terms of this License, +in one of these ways: + + a) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by the + Corresponding Source fixed on a durable physical medium + customarily used for software interchange. + + b) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by a + written offer, valid for at least three years and valid for as + long as you offer spare parts or customer support for that product + model, to give anyone who possesses the object code either (1) a + copy of the Corresponding Source for all the software in the + product that is covered by this License, on a durable physical + medium customarily used for software interchange, for a price no + more than your reasonable cost of physically performing this + conveying of source, or (2) access to copy the + Corresponding Source from a network server at no charge. + + c) Convey individual copies of the object code with a copy of the + written offer to provide the Corresponding Source. This + alternative is allowed only occasionally and noncommercially, and + only if you received the object code with such an offer, in accord + with subsection 6b. + + d) Convey the object code by offering access from a designated + place (gratis or for a charge), and offer equivalent access to the + Corresponding Source in the same way through the same place at no + further charge. You need not require recipients to copy the + Corresponding Source along with the object code. If the place to + copy the object code is a network server, the Corresponding Source + may be on a different server (operated by you or a third party) + that supports equivalent copying facilities, provided you maintain + clear directions next to the object code saying where to find the + Corresponding Source. Regardless of what server hosts the + Corresponding Source, you remain obligated to ensure that it is + available for as long as needed to satisfy these requirements. + + e) Convey the object code using peer-to-peer transmission, provided + you inform other peers where the object code and Corresponding + Source of the work are being offered to the general public at no + charge under subsection 6d. + + A separable portion of the object code, whose source code is excluded +from the Corresponding Source as a System Library, need not be +included in conveying the object code work. + + A "User Product" is either (1) a "consumer product", which means any +tangible personal property which is normally used for personal, family, +or household purposes, or (2) anything designed or sold for incorporation +into a dwelling. In determining whether a product is a consumer product, +doubtful cases shall be resolved in favor of coverage. For a particular +product received by a particular user, "normally used" refers to a +typical or common use of that class of product, regardless of the status +of the particular user or of the way in which the particular user +actually uses, or expects or is expected to use, the product. A product +is a consumer product regardless of whether the product has substantial +commercial, industrial or non-consumer uses, unless such uses represent +the only significant mode of use of the product. + + "Installation Information" for a User Product means any methods, +procedures, authorization keys, or other information required to install +and execute modified versions of a covered work in that User Product from +a modified version of its Corresponding Source. The information must +suffice to ensure that the continued functioning of the modified object +code is in no case prevented or interfered with solely because +modification has been made. + + If you convey an object code work under this section in, or with, or +specifically for use in, a User Product, and the conveying occurs as +part of a transaction in which the right of possession and use of the +User Product is transferred to the recipient in perpetuity or for a +fixed term (regardless of how the transaction is characterized), the +Corresponding Source conveyed under this section must be accompanied +by the Installation Information. But this requirement does not apply +if neither you nor any third party retains the ability to install +modified object code on the User Product (for example, the work has +been installed in ROM). + + The requirement to provide Installation Information does not include a +requirement to continue to provide support service, warranty, or updates +for a work that has been modified or installed by the recipient, or for +the User Product in which it has been modified or installed. Access to a +network may be denied when the modification itself materially and +adversely affects the operation of the network or violates the rules and +protocols for communication across the network. + + Corresponding Source conveyed, and Installation Information provided, +in accord with this section must be in a format that is publicly +documented (and with an implementation available to the public in +source code form), and must require no special password or key for +unpacking, reading or copying. + + 7. Additional Terms. + + "Additional permissions" are terms that supplement the terms of this +License by making exceptions from one or more of its conditions. +Additional permissions that are applicable to the entire Program shall +be treated as though they were included in this License, to the extent +that they are valid under applicable law. If additional permissions +apply only to part of the Program, that part may be used separately +under those permissions, but the entire Program remains governed by +this License without regard to the additional permissions. + + When you convey a copy of a covered work, you may at your option +remove any additional permissions from that copy, or from any part of +it. (Additional permissions may be written to require their own +removal in certain cases when you modify the work.) You may place +additional permissions on material, added by you to a covered work, +for which you have or can give appropriate copyright permission. + + Notwithstanding any other provision of this License, for material you +add to a covered work, you may (if authorized by the copyright holders of +that material) supplement the terms of this License with terms: + + a) Disclaiming warranty or limiting liability differently from the + terms of sections 15 and 16 of this License; or + + b) Requiring preservation of specified reasonable legal notices or + author attributions in that material or in the Appropriate Legal + Notices displayed by works containing it; or + + c) Prohibiting misrepresentation of the origin of that material, or + requiring that modified versions of such material be marked in + reasonable ways as different from the original version; or + + d) Limiting the use for publicity purposes of names of licensors or + authors of the material; or + + e) Declining to grant rights under trademark law for use of some + trade names, trademarks, or service marks; or + + f) Requiring indemnification of licensors and authors of that + material by anyone who conveys the material (or modified versions of + it) with contractual assumptions of liability to the recipient, for + any liability that these contractual assumptions directly impose on + those licensors and authors. + + All other non-permissive additional terms are considered "further +restrictions" within the meaning of section 10. If the Program as you +received it, or any part of it, contains a notice stating that it is +governed by this License along with a term that is a further +restriction, you may remove that term. If a license document contains +a further restriction but permits relicensing or conveying under this +License, you may add to a covered work material governed by the terms +of that license document, provided that the further restriction does +not survive such relicensing or conveying. + + If you add terms to a covered work in accord with this section, you +must place, in the relevant source files, a statement of the +additional terms that apply to those files, or a notice indicating +where to find the applicable terms. + + Additional terms, permissive or non-permissive, may be stated in the +form of a separately written license, or stated as exceptions; +the above requirements apply either way. + + 8. Termination. + + You may not propagate or modify a covered work except as expressly +provided under this License. Any attempt otherwise to propagate or +modify it is void, and will automatically terminate your rights under +this License (including any patent licenses granted under the third +paragraph of section 11). + + However, if you cease all violation of this License, then your +license from a particular copyright holder is reinstated (a) +provisionally, unless and until the copyright holder explicitly and +finally terminates your license, and (b) permanently, if the copyright +holder fails to notify you of the violation by some reasonable means +prior to 60 days after the cessation. + + Moreover, your license from a particular copyright holder is +reinstated permanently if the copyright holder notifies you of the +violation by some reasonable means, this is the first time you have +received notice of violation of this License (for any work) from that +copyright holder, and you cure the violation prior to 30 days after +your receipt of the notice. + + Termination of your rights under this section does not terminate the +licenses of parties who have received copies or rights from you under +this License. If your rights have been terminated and not permanently +reinstated, you do not qualify to receive new licenses for the same +material under section 10. + + 9. Acceptance Not Required for Having Copies. + + You are not required to accept this License in order to receive or +run a copy of the Program. Ancillary propagation of a covered work +occurring solely as a consequence of using peer-to-peer transmission +to receive a copy likewise does not require acceptance. However, +nothing other than this License grants you permission to propagate or +modify any covered work. These actions infringe copyright if you do +not accept this License. Therefore, by modifying or propagating a +covered work, you indicate your acceptance of this License to do so. + + 10. Automatic Licensing of Downstream Recipients. + + Each time you convey a covered work, the recipient automatically +receives a license from the original licensors, to run, modify and +propagate that work, subject to this License. You are not responsible +for enforcing compliance by third parties with this License. + + An "entity transaction" is a transaction transferring control of an +organization, or substantially all assets of one, or subdividing an +organization, or merging organizations. If propagation of a covered +work results from an entity transaction, each party to that +transaction who receives a copy of the work also receives whatever +licenses to the work the party's predecessor in interest had or could +give under the previous paragraph, plus a right to possession of the +Corresponding Source of the work from the predecessor in interest, if +the predecessor has it or can get it with reasonable efforts. + + You may not impose any further restrictions on the exercise of the +rights granted or affirmed under this License. For example, you may +not impose a license fee, royalty, or other charge for exercise of +rights granted under this License, and you may not initiate litigation +(including a cross-claim or counterclaim in a lawsuit) alleging that +any patent claim is infringed by making, using, selling, offering for +sale, or importing the Program or any portion of it. + + 11. Patents. + + A "contributor" is a copyright holder who authorizes use under this +License of the Program or a work on which the Program is based. The +work thus licensed is called the contributor's "contributor version". + + A contributor's "essential patent claims" are all patent claims +owned or controlled by the contributor, whether already acquired or +hereafter acquired, that would be infringed by some manner, permitted +by this License, of making, using, or selling its contributor version, +but do not include claims that would be infringed only as a +consequence of further modification of the contributor version. For +purposes of this definition, "control" includes the right to grant +patent sublicenses in a manner consistent with the requirements of +this License. + + Each contributor grants you a non-exclusive, worldwide, royalty-free +patent license under the contributor's essential patent claims, to +make, use, sell, offer for sale, import and otherwise run, modify and +propagate the contents of its contributor version. + + In the following three paragraphs, a "patent license" is any express +agreement or commitment, however denominated, not to enforce a patent +(such as an express permission to practice a patent or covenant not to +sue for patent infringement). To "grant" such a patent license to a +party means to make such an agreement or commitment not to enforce a +patent against the party. + + If you convey a covered work, knowingly relying on a patent license, +and the Corresponding Source of the work is not available for anyone +to copy, free of charge and under the terms of this License, through a +publicly available network server or other readily accessible means, +then you must either (1) cause the Corresponding Source to be so +available, or (2) arrange to deprive yourself of the benefit of the +patent license for this particular work, or (3) arrange, in a manner +consistent with the requirements of this License, to extend the patent +license to downstream recipients. "Knowingly relying" means you have +actual knowledge that, but for the patent license, your conveying the +covered work in a country, or your recipient's use of the covered work +in a country, would infringe one or more identifiable patents in that +country that you have reason to believe are valid. + + If, pursuant to or in connection with a single transaction or +arrangement, you convey, or propagate by procuring conveyance of, a +covered work, and grant a patent license to some of the parties +receiving the covered work authorizing them to use, propagate, modify +or convey a specific copy of the covered work, then the patent license +you grant is automatically extended to all recipients of the covered +work and works based on it. + + A patent license is "discriminatory" if it does not include within +the scope of its coverage, prohibits the exercise of, or is +conditioned on the non-exercise of one or more of the rights that are +specifically granted under this License. You may not convey a covered +work if you are a party to an arrangement with a third party that is +in the business of distributing software, under which you make payment +to the third party based on the extent of your activity of conveying +the work, and under which the third party grants, to any of the +parties who would receive the covered work from you, a discriminatory +patent license (a) in connection with copies of the covered work +conveyed by you (or copies made from those copies), or (b) primarily +for and in connection with specific products or compilations that +contain the covered work, unless you entered into that arrangement, +or that patent license was granted, prior to 28 March 2007. + + Nothing in this License shall be construed as excluding or limiting +any implied license or other defenses to infringement that may +otherwise be available to you under applicable patent law. + + 12. No Surrender of Others' Freedom. + + If conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot convey a +covered work so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you may +not convey it at all. For example, if you agree to terms that obligate you +to collect a royalty for further conveying from those to whom you convey +the Program, the only way you could satisfy both those terms and this +License would be to refrain entirely from conveying the Program. + + 13. Use with the GNU Affero General Public License. + + Notwithstanding any other provision of this License, you have +permission to link or combine any covered work with a work licensed +under version 3 of the GNU Affero General Public License into a single +combined work, and to convey the resulting work. The terms of this +License will continue to apply to the part which is the covered work, +but the special requirements of the GNU Affero General Public License, +section 13, concerning interaction through a network will apply to the +combination as such. + + 14. Revised Versions of this License. + + The Free Software Foundation may publish revised and/or new versions of +the GNU General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + + Each version is given a distinguishing version number. If the +Program specifies that a certain numbered version of the GNU General +Public License "or any later version" applies to it, you have the +option of following the terms and conditions either of that numbered +version or of any later version published by the Free Software +Foundation. If the Program does not specify a version number of the +GNU General Public License, you may choose any version ever published +by the Free Software Foundation. + + If the Program specifies that a proxy can decide which future +versions of the GNU General Public License can be used, that proxy's +public statement of acceptance of a version permanently authorizes you +to choose that version for the Program. + + Later license versions may give you additional or different +permissions. However, no additional obligations are imposed on any +author or copyright holder as a result of your choosing to follow a +later version. + + 15. Disclaimer of Warranty. + + THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY +APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT +HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY +OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, +THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM +IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF +ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + + 16. Limitation of Liability. + + IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS +THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY +GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE +USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF +DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD +PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), +EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF +SUCH DAMAGES. + + 17. Interpretation of Sections 15 and 16. + + If the disclaimer of warranty and limitation of liability provided +above cannot be given local legal effect according to their terms, +reviewing courts shall apply local law that most closely approximates +an absolute waiver of all civil liability in connection with the +Program, unless a warranty or assumption of liability accompanies a +copy of the Program in return for a fee. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +state the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + + Copyright (C) + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see . + +Also add information on how to contact you by electronic and paper mail. + + If the program does terminal interaction, make it output a short +notice like this when it starts in an interactive mode: + + Copyright (C) + This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, your program's commands +might be different; for a GUI interface, you would use an "about box". + + You should also get your employer (if you work as a programmer) or school, +if any, to sign a "copyright disclaimer" for the program, if necessary. +For more information on this, and how to apply and follow the GNU GPL, see +. + + The GNU General Public License does not permit incorporating your program +into proprietary programs. If your program is a subroutine library, you +may consider it more useful to permit linking proprietary applications with +the library. If this is what you want to do, use the GNU Lesser General +Public License instead of this License. But first, please read +. diff --git a/COPYING3.LIB b/COPYING3.LIB new file mode 100644 index 0000000..fc8a5de --- /dev/null +++ b/COPYING3.LIB @@ -0,0 +1,165 @@ + GNU LESSER GENERAL PUBLIC LICENSE + Version 3, 29 June 2007 + + Copyright (C) 2007 Free Software Foundation, Inc. + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + + This version of the GNU Lesser General Public License incorporates +the terms and conditions of version 3 of the GNU General Public +License, supplemented by the additional permissions listed below. + + 0. Additional Definitions. + + As used herein, "this License" refers to version 3 of the GNU Lesser +General Public License, and the "GNU GPL" refers to version 3 of the GNU +General Public License. + + "The Library" refers to a covered work governed by this License, +other than an Application or a Combined Work as defined below. + + An "Application" is any work that makes use of an interface provided +by the Library, but which is not otherwise based on the Library. +Defining a subclass of a class defined by the Library is deemed a mode +of using an interface provided by the Library. + + A "Combined Work" is a work produced by combining or linking an +Application with the Library. The particular version of the Library +with which the Combined Work was made is also called the "Linked +Version". + + The "Minimal Corresponding Source" for a Combined Work means the +Corresponding Source for the Combined Work, excluding any source code +for portions of the Combined Work that, considered in isolation, are +based on the Application, and not on the Linked Version. + + The "Corresponding Application Code" for a Combined Work means the +object code and/or source code for the Application, including any data +and utility programs needed for reproducing the Combined Work from the +Application, but excluding the System Libraries of the Combined Work. + + 1. Exception to Section 3 of the GNU GPL. + + You may convey a covered work under sections 3 and 4 of this License +without being bound by section 3 of the GNU GPL. + + 2. Conveying Modified Versions. + + If you modify a copy of the Library, and, in your modifications, a +facility refers to a function or data to be supplied by an Application +that uses the facility (other than as an argument passed when the +facility is invoked), then you may convey a copy of the modified +version: + + a) under this License, provided that you make a good faith effort to + ensure that, in the event an Application does not supply the + function or data, the facility still operates, and performs + whatever part of its purpose remains meaningful, or + + b) under the GNU GPL, with none of the additional permissions of + this License applicable to that copy. + + 3. Object Code Incorporating Material from Library Header Files. + + The object code form of an Application may incorporate material from +a header file that is part of the Library. You may convey such object +code under terms of your choice, provided that, if the incorporated +material is not limited to numerical parameters, data structure +layouts and accessors, or small macros, inline functions and templates +(ten or fewer lines in length), you do both of the following: + + a) Give prominent notice with each copy of the object code that the + Library is used in it and that the Library and its use are + covered by this License. + + b) Accompany the object code with a copy of the GNU GPL and this license + document. + + 4. Combined Works. + + You may convey a Combined Work under terms of your choice that, +taken together, effectively do not restrict modification of the +portions of the Library contained in the Combined Work and reverse +engineering for debugging such modifications, if you also do each of +the following: + + a) Give prominent notice with each copy of the Combined Work that + the Library is used in it and that the Library and its use are + covered by this License. + + b) Accompany the Combined Work with a copy of the GNU GPL and this license + document. + + c) For a Combined Work that displays copyright notices during + execution, include the copyright notice for the Library among + these notices, as well as a reference directing the user to the + copies of the GNU GPL and this license document. + + d) Do one of the following: + + 0) Convey the Minimal Corresponding Source under the terms of this + License, and the Corresponding Application Code in a form + suitable for, and under terms that permit, the user to + recombine or relink the Application with a modified version of + the Linked Version to produce a modified Combined Work, in the + manner specified by section 6 of the GNU GPL for conveying + Corresponding Source. + + 1) Use a suitable shared library mechanism for linking with the + Library. A suitable mechanism is one that (a) uses at run time + a copy of the Library already present on the user's computer + system, and (b) will operate properly with a modified version + of the Library that is interface-compatible with the Linked + Version. + + e) Provide Installation Information, but only if you would otherwise + be required to provide such information under section 6 of the + GNU GPL, and only to the extent that such information is + necessary to install and execute a modified version of the + Combined Work produced by recombining or relinking the + Application with a modified version of the Linked Version. (If + you use option 4d0, the Installation Information must accompany + the Minimal Corresponding Source and Corresponding Application + Code. If you use option 4d1, you must provide the Installation + Information in the manner specified by section 6 of the GNU GPL + for conveying Corresponding Source.) + + 5. Combined Libraries. + + You may place library facilities that are a work based on the +Library side by side in a single library together with other library +facilities that are not Applications and are not covered by this +License, and convey such a combined library under terms of your +choice, if you do both of the following: + + a) Accompany the combined library with a copy of the same work based + on the Library, uncombined with any other library facilities, + conveyed under the terms of this License. + + b) Give prominent notice with the combined library that part of it + is a work based on the Library, and explaining where to find the + accompanying uncombined form of the same work. + + 6. Revised Versions of the GNU Lesser General Public License. + + The Free Software Foundation may publish revised and/or new versions +of the GNU Lesser General Public License from time to time. Such new +versions will be similar in spirit to the present version, but may +differ in detail to address new problems or concerns. + + Each version is given a distinguishing version number. If the +Library as you received it specifies that a certain numbered version +of the GNU Lesser General Public License "or any later version" +applies to it, you have the option of following the terms and +conditions either of that published version or of any later version +published by the Free Software Foundation. If the Library as you +received it does not specify a version number of the GNU Lesser +General Public License, you may choose any version of the GNU Lesser +General Public License ever published by the Free Software Foundation. + + If the Library as you received it specifies that a proxy can decide +whether future versions of the GNU Lesser General Public License shall +apply, that proxy's public statement of acceptance of any version is +permanent authorization for you to choose that version for the +Library. diff --git a/MODULE_LICENSE_GPL b/MODULE_LICENSE_GPL new file mode 100644 index 0000000..e69de29 diff --git a/NOTICE b/NOTICE new file mode 100644 index 0000000..623b625 --- /dev/null +++ b/NOTICE @@ -0,0 +1,340 @@ + GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc. + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Library General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + + 2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + + 4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + + 5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + + 6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + + 7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + + 10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + + NO WARRANTY + + 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + + 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + + Copyright (C) + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + + +Also add information on how to contact you by electronic and paper mail. + +If the program is interactive, make it output a short notice like this +when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) year name of author + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, the commands you use may +be called something other than `show w' and `show c'; they could even be +mouse-clicks or menu items--whatever suits your program. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the program, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program + `Gnomovision' (which makes passes at compilers) written by James Hacker. + + , 1 April 1989 + Ty Coon, President of Vice + +This General Public License does not permit incorporating your program into +proprietary programs. If your program is a subroutine library, you may +consider it more useful to permit linking proprietary applications with the +library. If this is what you want to do, use the GNU Library General +Public License instead of this License. diff --git a/OWNERS b/OWNERS new file mode 100644 index 0000000..d7bfa1c --- /dev/null +++ b/OWNERS @@ -0,0 +1,3 @@ +cmtice@google.com +llozano@google.com +ndesaulniers@google.com diff --git a/bin/real-x86_64-linux-android-g++ b/bin/real-x86_64-linux-android-g++ new file mode 100755 index 0000000..54944a2 --- /dev/null +++ b/bin/real-x86_64-linux-android-g++ @@ -0,0 +1,66 @@ +#!/usr/bin/python + +import os +import sys + +class CompilerWrapper(): + def __init__(self, argv): + self.args = argv + self.execargs = [] + self.real_compiler = None + self.argv0 = None + self.append_flags = [] + self.prepend_flags = [] + self.custom_flags = { + '--gomacc-path': None + } + + def set_real_compiler(self): + """Find the real compiler with the absolute path.""" + compiler_path = os.path.dirname(os.path.abspath(__file__)) + if os.path.islink(__file__): + compiler = os.path.basename(os.readlink(__file__)) + else: + compiler = os.path.basename(os.path.abspath(__file__)) + self.real_compiler = os.path.join( + compiler_path, + "real-" + compiler) + self.argv0 = self.real_compiler + + def process_gomacc_command(self): + """Return the gomacc command if '--gomacc-path' is set.""" + gomacc = self.custom_flags['--gomacc-path'] + if gomacc and os.path.isfile(gomacc): + self.argv0 = gomacc + self.execargs += [gomacc] + + def parse_custom_flags(self): + i = 0 + args = [] + while i < len(self.args): + if self.args[i] in self.custom_flags: + self.custom_flags[self.args[i]] = self.args[i + 1] + i = i + 2 + else: + args.append(self.args[i]) + i = i + 1 + self.args = args + + def add_flags(self): + self.args = self.prepend_flags + self.args + self.append_flags + + def invoke_compiler(self): + self.set_real_compiler() + self.parse_custom_flags() + self.process_gomacc_command() + self.add_flags() + self.execargs += [self.real_compiler] + self.args + os.execv(self.argv0, self.execargs) + + +def main(argv): + cw = CompilerWrapper(argv[1:]) + cw.invoke_compiler() + +if __name__ == "__main__": + main(sys.argv) diff --git a/bin/real-x86_64-linux-android-gcc b/bin/real-x86_64-linux-android-gcc new file mode 100755 index 0000000..54944a2 --- /dev/null +++ b/bin/real-x86_64-linux-android-gcc @@ -0,0 +1,66 @@ +#!/usr/bin/python + +import os +import sys + +class CompilerWrapper(): + def __init__(self, argv): + self.args = argv + self.execargs = [] + self.real_compiler = None + self.argv0 = None + self.append_flags = [] + self.prepend_flags = [] + self.custom_flags = { + '--gomacc-path': None + } + + def set_real_compiler(self): + """Find the real compiler with the absolute path.""" + compiler_path = os.path.dirname(os.path.abspath(__file__)) + if os.path.islink(__file__): + compiler = os.path.basename(os.readlink(__file__)) + else: + compiler = os.path.basename(os.path.abspath(__file__)) + self.real_compiler = os.path.join( + compiler_path, + "real-" + compiler) + self.argv0 = self.real_compiler + + def process_gomacc_command(self): + """Return the gomacc command if '--gomacc-path' is set.""" + gomacc = self.custom_flags['--gomacc-path'] + if gomacc and os.path.isfile(gomacc): + self.argv0 = gomacc + self.execargs += [gomacc] + + def parse_custom_flags(self): + i = 0 + args = [] + while i < len(self.args): + if self.args[i] in self.custom_flags: + self.custom_flags[self.args[i]] = self.args[i + 1] + i = i + 2 + else: + args.append(self.args[i]) + i = i + 1 + self.args = args + + def add_flags(self): + self.args = self.prepend_flags + self.args + self.append_flags + + def invoke_compiler(self): + self.set_real_compiler() + self.parse_custom_flags() + self.process_gomacc_command() + self.add_flags() + self.execargs += [self.real_compiler] + self.args + os.execv(self.argv0, self.execargs) + + +def main(argv): + cw = CompilerWrapper(argv[1:]) + cw.invoke_compiler() + +if __name__ == "__main__": + main(sys.argv) diff --git a/bin/x86_64-linux-android-addr2line b/bin/x86_64-linux-android-addr2line new file mode 100755 index 0000000..955a5f1 Binary files /dev/null and b/bin/x86_64-linux-android-addr2line differ diff --git a/bin/x86_64-linux-android-ar b/bin/x86_64-linux-android-ar new file mode 100755 index 0000000..792135a Binary files /dev/null and b/bin/x86_64-linux-android-ar differ diff --git a/bin/x86_64-linux-android-as b/bin/x86_64-linux-android-as new file mode 100755 index 0000000..61c8ea7 Binary files /dev/null and b/bin/x86_64-linux-android-as differ diff --git a/bin/x86_64-linux-android-c++ b/bin/x86_64-linux-android-c++ new file mode 120000 index 0000000..425d82a --- /dev/null +++ b/bin/x86_64-linux-android-c++ @@ -0,0 +1 @@ +x86_64-linux-android-g++ \ No newline at end of file diff --git a/bin/x86_64-linux-android-c++filt b/bin/x86_64-linux-android-c++filt new file mode 100755 index 0000000..2ee9cb3 Binary files /dev/null and b/bin/x86_64-linux-android-c++filt differ diff --git a/bin/x86_64-linux-android-cpp b/bin/x86_64-linux-android-cpp new file mode 100755 index 0000000..ee31111 Binary files /dev/null and b/bin/x86_64-linux-android-cpp differ diff --git a/bin/x86_64-linux-android-dwp b/bin/x86_64-linux-android-dwp new file mode 100755 index 0000000..1d68ca5 Binary files /dev/null and b/bin/x86_64-linux-android-dwp differ diff --git a/bin/x86_64-linux-android-elfedit b/bin/x86_64-linux-android-elfedit new file mode 100755 index 0000000..e068022 Binary files /dev/null and b/bin/x86_64-linux-android-elfedit differ diff --git a/bin/x86_64-linux-android-g++ b/bin/x86_64-linux-android-g++ new file mode 100755 index 0000000..54944a2 --- /dev/null +++ b/bin/x86_64-linux-android-g++ @@ -0,0 +1,66 @@ +#!/usr/bin/python + +import os +import sys + +class CompilerWrapper(): + def __init__(self, argv): + self.args = argv + self.execargs = [] + self.real_compiler = None + self.argv0 = None + self.append_flags = [] + self.prepend_flags = [] + self.custom_flags = { + '--gomacc-path': None + } + + def set_real_compiler(self): + """Find the real compiler with the absolute path.""" + compiler_path = os.path.dirname(os.path.abspath(__file__)) + if os.path.islink(__file__): + compiler = os.path.basename(os.readlink(__file__)) + else: + compiler = os.path.basename(os.path.abspath(__file__)) + self.real_compiler = os.path.join( + compiler_path, + "real-" + compiler) + self.argv0 = self.real_compiler + + def process_gomacc_command(self): + """Return the gomacc command if '--gomacc-path' is set.""" + gomacc = self.custom_flags['--gomacc-path'] + if gomacc and os.path.isfile(gomacc): + self.argv0 = gomacc + self.execargs += [gomacc] + + def parse_custom_flags(self): + i = 0 + args = [] + while i < len(self.args): + if self.args[i] in self.custom_flags: + self.custom_flags[self.args[i]] = self.args[i + 1] + i = i + 2 + else: + args.append(self.args[i]) + i = i + 1 + self.args = args + + def add_flags(self): + self.args = self.prepend_flags + self.args + self.append_flags + + def invoke_compiler(self): + self.set_real_compiler() + self.parse_custom_flags() + self.process_gomacc_command() + self.add_flags() + self.execargs += [self.real_compiler] + self.args + os.execv(self.argv0, self.execargs) + + +def main(argv): + cw = CompilerWrapper(argv[1:]) + cw.invoke_compiler() + +if __name__ == "__main__": + main(sys.argv) diff --git a/bin/x86_64-linux-android-gcc b/bin/x86_64-linux-android-gcc new file mode 100755 index 0000000..54944a2 --- /dev/null +++ b/bin/x86_64-linux-android-gcc @@ -0,0 +1,66 @@ +#!/usr/bin/python + +import os +import sys + +class CompilerWrapper(): + def __init__(self, argv): + self.args = argv + self.execargs = [] + self.real_compiler = None + self.argv0 = None + self.append_flags = [] + self.prepend_flags = [] + self.custom_flags = { + '--gomacc-path': None + } + + def set_real_compiler(self): + """Find the real compiler with the absolute path.""" + compiler_path = os.path.dirname(os.path.abspath(__file__)) + if os.path.islink(__file__): + compiler = os.path.basename(os.readlink(__file__)) + else: + compiler = os.path.basename(os.path.abspath(__file__)) + self.real_compiler = os.path.join( + compiler_path, + "real-" + compiler) + self.argv0 = self.real_compiler + + def process_gomacc_command(self): + """Return the gomacc command if '--gomacc-path' is set.""" + gomacc = self.custom_flags['--gomacc-path'] + if gomacc and os.path.isfile(gomacc): + self.argv0 = gomacc + self.execargs += [gomacc] + + def parse_custom_flags(self): + i = 0 + args = [] + while i < len(self.args): + if self.args[i] in self.custom_flags: + self.custom_flags[self.args[i]] = self.args[i + 1] + i = i + 2 + else: + args.append(self.args[i]) + i = i + 1 + self.args = args + + def add_flags(self): + self.args = self.prepend_flags + self.args + self.append_flags + + def invoke_compiler(self): + self.set_real_compiler() + self.parse_custom_flags() + self.process_gomacc_command() + self.add_flags() + self.execargs += [self.real_compiler] + self.args + os.execv(self.argv0, self.execargs) + + +def main(argv): + cw = CompilerWrapper(argv[1:]) + cw.invoke_compiler() + +if __name__ == "__main__": + main(sys.argv) diff --git a/bin/x86_64-linux-android-gcc-4.9 b/bin/x86_64-linux-android-gcc-4.9 new file mode 120000 index 0000000..c953618 --- /dev/null +++ b/bin/x86_64-linux-android-gcc-4.9 @@ -0,0 +1 @@ +x86_64-linux-android-gcc \ No newline at end of file diff --git a/bin/x86_64-linux-android-gcc-4.9.x b/bin/x86_64-linux-android-gcc-4.9.x new file mode 100755 index 0000000..33e6c27 Binary files /dev/null and b/bin/x86_64-linux-android-gcc-4.9.x differ diff --git a/bin/x86_64-linux-android-gcc-ar b/bin/x86_64-linux-android-gcc-ar new file mode 100755 index 0000000..16af0c3 Binary files /dev/null and b/bin/x86_64-linux-android-gcc-ar differ diff --git a/bin/x86_64-linux-android-gcc-nm b/bin/x86_64-linux-android-gcc-nm new file mode 100755 index 0000000..66c8758 Binary files /dev/null and b/bin/x86_64-linux-android-gcc-nm differ diff --git a/bin/x86_64-linux-android-gcc-ranlib b/bin/x86_64-linux-android-gcc-ranlib new file mode 100755 index 0000000..9a631eb Binary files /dev/null and b/bin/x86_64-linux-android-gcc-ranlib differ diff --git a/bin/x86_64-linux-android-gcov b/bin/x86_64-linux-android-gcov new file mode 100755 index 0000000..fbdc16e Binary files /dev/null and b/bin/x86_64-linux-android-gcov differ diff --git a/bin/x86_64-linux-android-gcov-tool b/bin/x86_64-linux-android-gcov-tool new file mode 100755 index 0000000..b524e8a Binary files /dev/null and b/bin/x86_64-linux-android-gcov-tool differ diff --git a/bin/x86_64-linux-android-gprof b/bin/x86_64-linux-android-gprof new file mode 100755 index 0000000..137bf92 Binary files /dev/null and b/bin/x86_64-linux-android-gprof differ diff --git a/bin/x86_64-linux-android-ld b/bin/x86_64-linux-android-ld new file mode 120000 index 0000000..3d3aa39 --- /dev/null +++ b/bin/x86_64-linux-android-ld @@ -0,0 +1 @@ +x86_64-linux-android-ld.gold \ No newline at end of file diff --git a/bin/x86_64-linux-android-ld.bfd b/bin/x86_64-linux-android-ld.bfd new file mode 100755 index 0000000..30c572b Binary files /dev/null and b/bin/x86_64-linux-android-ld.bfd differ diff --git a/bin/x86_64-linux-android-ld.gold b/bin/x86_64-linux-android-ld.gold new file mode 100755 index 0000000..665f3c4 Binary files /dev/null and b/bin/x86_64-linux-android-ld.gold differ diff --git a/bin/x86_64-linux-android-nm b/bin/x86_64-linux-android-nm new file mode 100755 index 0000000..575a569 Binary files /dev/null and b/bin/x86_64-linux-android-nm differ diff --git a/bin/x86_64-linux-android-objcopy b/bin/x86_64-linux-android-objcopy new file mode 100755 index 0000000..06d506e Binary files /dev/null and b/bin/x86_64-linux-android-objcopy differ diff --git a/bin/x86_64-linux-android-objdump b/bin/x86_64-linux-android-objdump new file mode 100755 index 0000000..311ac45 Binary files /dev/null and b/bin/x86_64-linux-android-objdump differ diff --git a/bin/x86_64-linux-android-ranlib b/bin/x86_64-linux-android-ranlib new file mode 100755 index 0000000..5376c5b Binary files /dev/null and b/bin/x86_64-linux-android-ranlib differ diff --git a/bin/x86_64-linux-android-readelf b/bin/x86_64-linux-android-readelf new file mode 100755 index 0000000..952388d Binary files /dev/null and b/bin/x86_64-linux-android-readelf differ diff --git a/bin/x86_64-linux-android-size b/bin/x86_64-linux-android-size new file mode 100755 index 0000000..8bd511c Binary files /dev/null and b/bin/x86_64-linux-android-size differ diff --git a/bin/x86_64-linux-android-strings b/bin/x86_64-linux-android-strings new file mode 100755 index 0000000..0972cae Binary files /dev/null and b/bin/x86_64-linux-android-strings differ diff --git a/bin/x86_64-linux-android-strip b/bin/x86_64-linux-android-strip new file mode 100755 index 0000000..342292d Binary files /dev/null and b/bin/x86_64-linux-android-strip differ diff --git a/bin/x86_64-linux-androidkernel-ar b/bin/x86_64-linux-androidkernel-ar new file mode 120000 index 0000000..b18bea9 --- /dev/null +++ b/bin/x86_64-linux-androidkernel-ar @@ -0,0 +1 @@ +x86_64-linux-android-ar \ No newline at end of file diff --git a/bin/x86_64-linux-androidkernel-as b/bin/x86_64-linux-androidkernel-as new file mode 120000 index 0000000..dfba2d8 --- /dev/null +++ b/bin/x86_64-linux-androidkernel-as @@ -0,0 +1 @@ +x86_64-linux-android-as \ No newline at end of file diff --git a/bin/x86_64-linux-androidkernel-cpp b/bin/x86_64-linux-androidkernel-cpp new file mode 120000 index 0000000..8163a00 --- /dev/null +++ b/bin/x86_64-linux-androidkernel-cpp @@ -0,0 +1 @@ +x86_64-linux-android-cpp \ No newline at end of file diff --git a/bin/x86_64-linux-androidkernel-elfedit b/bin/x86_64-linux-androidkernel-elfedit new file mode 120000 index 0000000..4799320 --- /dev/null +++ b/bin/x86_64-linux-androidkernel-elfedit @@ -0,0 +1 @@ +x86_64-linux-android-elfedit \ No newline at end of file diff --git a/bin/x86_64-linux-androidkernel-gcc b/bin/x86_64-linux-androidkernel-gcc new file mode 120000 index 0000000..c953618 --- /dev/null +++ b/bin/x86_64-linux-androidkernel-gcc @@ -0,0 +1 @@ +x86_64-linux-android-gcc \ No newline at end of file diff --git a/bin/x86_64-linux-androidkernel-ld b/bin/x86_64-linux-androidkernel-ld new file mode 120000 index 0000000..bfa3818 --- /dev/null +++ b/bin/x86_64-linux-androidkernel-ld @@ -0,0 +1 @@ +x86_64-linux-android-ld.bfd \ No newline at end of file diff --git a/bin/x86_64-linux-androidkernel-nm b/bin/x86_64-linux-androidkernel-nm new file mode 120000 index 0000000..68616b5 --- /dev/null +++ b/bin/x86_64-linux-androidkernel-nm @@ -0,0 +1 @@ +x86_64-linux-android-nm \ No newline at end of file diff --git a/bin/x86_64-linux-androidkernel-objcopy b/bin/x86_64-linux-androidkernel-objcopy new file mode 120000 index 0000000..224414f --- /dev/null +++ b/bin/x86_64-linux-androidkernel-objcopy @@ -0,0 +1 @@ +x86_64-linux-android-objcopy \ No newline at end of file diff --git a/bin/x86_64-linux-androidkernel-objdump b/bin/x86_64-linux-androidkernel-objdump new file mode 120000 index 0000000..4e035a7 --- /dev/null +++ b/bin/x86_64-linux-androidkernel-objdump @@ -0,0 +1 @@ +x86_64-linux-android-objdump \ No newline at end of file diff --git a/bin/x86_64-linux-androidkernel-readelf b/bin/x86_64-linux-androidkernel-readelf new file mode 120000 index 0000000..3c38c23 --- /dev/null +++ b/bin/x86_64-linux-androidkernel-readelf @@ -0,0 +1 @@ +x86_64-linux-android-readelf \ No newline at end of file diff --git a/bin/x86_64-linux-androidkernel-size b/bin/x86_64-linux-androidkernel-size new file mode 120000 index 0000000..0644174 --- /dev/null +++ b/bin/x86_64-linux-androidkernel-size @@ -0,0 +1 @@ +x86_64-linux-android-size \ No newline at end of file diff --git a/bin/x86_64-linux-androidkernel-strip b/bin/x86_64-linux-androidkernel-strip new file mode 120000 index 0000000..69d0bf5 --- /dev/null +++ b/bin/x86_64-linux-androidkernel-strip @@ -0,0 +1 @@ +x86_64-linux-android-strip \ No newline at end of file diff --git a/lib/gcc/x86_64-linux-android/4.9.x/32/crtbegin.o b/lib/gcc/x86_64-linux-android/4.9.x/32/crtbegin.o new file mode 100644 index 0000000..80ae3e0 Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/32/crtbegin.o differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/32/crtbeginS.o b/lib/gcc/x86_64-linux-android/4.9.x/32/crtbeginS.o new file mode 100644 index 0000000..e2bf67d Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/32/crtbeginS.o differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/32/crtbeginT.o b/lib/gcc/x86_64-linux-android/4.9.x/32/crtbeginT.o new file mode 100644 index 0000000..80ae3e0 Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/32/crtbeginT.o differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/32/crtend.o b/lib/gcc/x86_64-linux-android/4.9.x/32/crtend.o new file mode 100644 index 0000000..2a4239b Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/32/crtend.o differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/32/crtendS.o b/lib/gcc/x86_64-linux-android/4.9.x/32/crtendS.o new file mode 100644 index 0000000..2a4239b Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/32/crtendS.o differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/32/crtfastmath.o b/lib/gcc/x86_64-linux-android/4.9.x/32/crtfastmath.o new file mode 100644 index 0000000..081557b Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/32/crtfastmath.o differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/32/crtprec32.o b/lib/gcc/x86_64-linux-android/4.9.x/32/crtprec32.o new file mode 100644 index 0000000..4878f97 Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/32/crtprec32.o differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/32/crtprec64.o b/lib/gcc/x86_64-linux-android/4.9.x/32/crtprec64.o new file mode 100644 index 0000000..d411cf2 Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/32/crtprec64.o differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/32/crtprec80.o b/lib/gcc/x86_64-linux-android/4.9.x/32/crtprec80.o new file mode 100644 index 0000000..cda349e Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/32/crtprec80.o differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/32/libgcc.a b/lib/gcc/x86_64-linux-android/4.9.x/32/libgcc.a new file mode 100644 index 0000000..855e43e Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/32/libgcc.a differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/32/libgcov.a b/lib/gcc/x86_64-linux-android/4.9.x/32/libgcov.a new file mode 100644 index 0000000..b450a0e Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/32/libgcov.a differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/crtbegin.o b/lib/gcc/x86_64-linux-android/4.9.x/crtbegin.o new file mode 100644 index 0000000..ab76425 Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/crtbegin.o differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/crtbeginS.o b/lib/gcc/x86_64-linux-android/4.9.x/crtbeginS.o new file mode 100644 index 0000000..92b270b Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/crtbeginS.o differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/crtbeginT.o b/lib/gcc/x86_64-linux-android/4.9.x/crtbeginT.o new file mode 100644 index 0000000..ab76425 Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/crtbeginT.o differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/crtend.o b/lib/gcc/x86_64-linux-android/4.9.x/crtend.o new file mode 100644 index 0000000..f7d817f Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/crtend.o differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/crtendS.o b/lib/gcc/x86_64-linux-android/4.9.x/crtendS.o new file mode 100644 index 0000000..f7d817f Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/crtendS.o differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/crtfastmath.o b/lib/gcc/x86_64-linux-android/4.9.x/crtfastmath.o new file mode 100644 index 0000000..2c8c454 Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/crtfastmath.o differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/crtprec32.o b/lib/gcc/x86_64-linux-android/4.9.x/crtprec32.o new file mode 100644 index 0000000..89d7cfd Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/crtprec32.o differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/crtprec64.o b/lib/gcc/x86_64-linux-android/4.9.x/crtprec64.o new file mode 100644 index 0000000..f862b6b Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/crtprec64.o differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/crtprec80.o b/lib/gcc/x86_64-linux-android/4.9.x/crtprec80.o new file mode 100644 index 0000000..638cb92 Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/crtprec80.o differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/gcov-counter.def b/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/gcov-counter.def new file mode 100644 index 0000000..e847f05 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/gcov-counter.def @@ -0,0 +1,60 @@ +/* Definitions for the gcov counters in the GNU compiler. + Copyright (C) 2001-2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +/* Before including this file, define a macro: + + DEF_GCOV_COUNTER(COUNTER, NAME, FN_TYPE) + + This macro will be expanded to all supported gcov counters, their + names, or the type of handler functions. FN_TYPE will be + expanded to a handler function, like in gcov_merge, it is + expanded to __gcov_merge ## FN_TYPE. */ + +/* Arc transitions. */ +DEF_GCOV_COUNTER(GCOV_COUNTER_ARCS, "arcs", _add) + +/* Histogram of value inside an interval. */ +DEF_GCOV_COUNTER(GCOV_COUNTER_V_INTERVAL, "interval", _add) + +/* Histogram of exact power2 logarithm of a value. */ +DEF_GCOV_COUNTER(GCOV_COUNTER_V_POW2, "pow2", _add) + +/* The most common value of expression. */ +DEF_GCOV_COUNTER(GCOV_COUNTER_V_SINGLE, "single", _single) + +/* The most common difference between consecutive values of expression. */ +DEF_GCOV_COUNTER(GCOV_COUNTER_V_DELTA, "delta", _delta) + +/* The most common indirect address. */ +DEF_GCOV_COUNTER(GCOV_COUNTER_V_INDIR, "indirect_call", _single) + +/* Compute average value passed to the counter. */ +DEF_GCOV_COUNTER(GCOV_COUNTER_AVERAGE, "average", _add) + +/* IOR of the all values passed to counter. */ +DEF_GCOV_COUNTER(GCOV_COUNTER_IOR, "ior", _ior) + +/* Top N value tracking for indirect calls */ +DEF_GCOV_COUNTER(GCOV_COUNTER_ICALL_TOPNV, "indirect_call_topn", _icall_topn) + +/* Time profile collecting first run of a function */ +DEF_GCOV_COUNTER(GCOV_TIME_PROFILER, "time_profiler", _time_profile) + +/* Top N value tracking for indirect calls */ +DEF_GCOV_COUNTER(GCOV_COUNTER_DIRECT_CALL, "direct_call", _dc) diff --git a/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/gcov-io.c b/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/gcov-io.c new file mode 100644 index 0000000..fc5e32e --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/gcov-io.c @@ -0,0 +1,1233 @@ +/* File format for coverage information + Copyright (C) 1996-2014 Free Software Foundation, Inc. + Contributed by Bob Manson . + Completely remangled by Nathan Sidwell . + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +. */ + +/* Routines declared in gcov-io.h. This file should be #included by + another source file, after having #included gcov-io.h. */ + +#if !IN_GCOV +static void gcov_write_block (unsigned); +static gcov_unsigned_t *gcov_write_words (unsigned); +#endif +static const gcov_unsigned_t *gcov_read_words (unsigned); +#if !IN_LIBGCOV +static void gcov_allocate (unsigned); +#endif + +/* Optimum number of gcov_unsigned_t's read from or written to disk. */ +#define GCOV_BLOCK_SIZE (1 << 10) + +GCOV_LINKAGE struct gcov_var +{ + _GCOV_FILE *file; + gcov_position_t start; /* Position of first byte of block */ + unsigned offset; /* Read/write position within the block. */ + unsigned length; /* Read limit in the block. */ + unsigned overread; /* Number of words overread. */ + int error; /* < 0 overflow, > 0 disk error. */ + int mode; /* < 0 writing, > 0 reading */ +#if IN_LIBGCOV + /* Holds one block plus 4 bytes, thus all coverage reads & writes + fit within this buffer and we always can transfer GCOV_BLOCK_SIZE + to and from the disk. libgcov never backtracks and only writes 4 + or 8 byte objects. */ + gcov_unsigned_t buffer[GCOV_BLOCK_SIZE + 1]; +#else + int endian; /* Swap endianness. */ + /* Holds a variable length block, as the compiler can write + strings and needs to backtrack. */ + size_t alloc; + gcov_unsigned_t *buffer; +#endif +} gcov_var; + +/* Save the current position in the gcov file. */ +/* We need to expose this function when compiling for gcov-tool. */ +#ifndef IN_GCOV_TOOL +static inline +#endif +gcov_position_t +gcov_position (void) +{ + return gcov_var.start + gcov_var.offset; +} + +/* Return nonzero if the error flag is set. */ +/* We need to expose this function when compiling for gcov-tool. */ +#ifndef IN_GCOV_TOOL +static inline +#endif +int +gcov_is_error (void) +{ + return gcov_var.file ? gcov_var.error : 1; +} + +#if IN_LIBGCOV +/* Move to beginning of file and initialize for writing. */ +GCOV_LINKAGE inline void +gcov_rewrite (void) +{ + gcc_assert (gcov_var.mode > 0); + gcov_var.mode = -1; + gcov_var.start = 0; + gcov_var.offset = 0; + _GCOV_fseek (gcov_var.file, 0L, SEEK_SET); +} +#endif + +static inline gcov_unsigned_t from_file (gcov_unsigned_t value) +{ +#if !IN_LIBGCOV + if (gcov_var.endian) + { + value = (value >> 16) | (value << 16); + value = ((value & 0xff00ff) << 8) | ((value >> 8) & 0xff00ff); + } +#endif + return value; +} + +/* Open a gcov file. NAME is the name of the file to open and MODE + indicates whether a new file should be created, or an existing file + opened. If MODE is >= 0 an existing file will be opened, if + possible, and if MODE is <= 0, a new file will be created. Use + MODE=0 to attempt to reopen an existing file and then fall back on + creating a new one. If MODE < 0, the file will be opened in + read-only mode. Otherwise it will be opened for modification. + Return zero on failure, >0 on opening an existing file and <0 on + creating a new one. */ + +#ifndef __KERNEL__ +GCOV_LINKAGE int +#if IN_LIBGCOV +gcov_open (const char *name) +#else +gcov_open (const char *name, int mode) +#endif +{ +#if IN_LIBGCOV + const int mode = 0; +#endif +#if GCOV_LOCKED + struct flock s_flock; + int fd; + + s_flock.l_whence = SEEK_SET; + s_flock.l_start = 0; + s_flock.l_len = 0; /* Until EOF. */ + s_flock.l_pid = getpid (); +#endif + + gcc_assert (!gcov_var.file); + gcov_var.start = 0; + gcov_var.offset = gcov_var.length = 0; + gcov_var.overread = -1u; + gcov_var.error = 0; +#if !IN_LIBGCOV + gcov_var.endian = 0; +#endif +#if GCOV_LOCKED + if (mode > 0) + { + /* Read-only mode - acquire a read-lock. */ + s_flock.l_type = F_RDLCK; + /* pass mode (ignored) for compatibility */ + fd = open (name, O_RDONLY, S_IRUSR | S_IWUSR); + } + else if (mode < 0) + { + /* Write mode - acquire a write-lock. */ + s_flock.l_type = F_WRLCK; + fd = open (name, O_RDWR | O_CREAT | O_TRUNC, 0666); + } + else /* mode == 0 */ + { + /* Read-Write mode - acquire a write-lock. */ + s_flock.l_type = F_WRLCK; + fd = open (name, O_RDWR | O_CREAT, 0666); + } + if (fd < 0) + return 0; + + while (fcntl (fd, F_SETLKW, &s_flock) && errno == EINTR) + continue; + + gcov_var.file = fdopen (fd, (mode > 0) ? "rb" : "r+b"); + + if (!gcov_var.file) + { + close (fd); + return 0; + } + + if (mode > 0) + gcov_var.mode = 1; + else if (mode == 0) + { + struct stat st; + + if (fstat (fd, &st) < 0) + { + _GCOV_fclose (gcov_var.file); + gcov_var.file = 0; + return 0; + } + if (st.st_size != 0) + gcov_var.mode = 1; + else + gcov_var.mode = mode * 2 + 1; + } + else + gcov_var.mode = mode * 2 + 1; +#else + if (mode >= 0) + gcov_var.file = _GCOV_fopen (name, (mode > 0) ? "rb" : "r+b"); + + if (gcov_var.file) + gcov_var.mode = 1; + else if (mode <= 0) + { + gcov_var.file = _GCOV_fopen (name, "w+b"); + if (gcov_var.file) + gcov_var.mode = mode * 2 + 1; + } + if (!gcov_var.file) + return 0; +#endif + + setbuf (gcov_var.file, (char *)0); + + return 1; +} +#else /* __KERNEL__ */ + +extern _GCOV_FILE *gcov_current_file; + +GCOV_LINKAGE int +gcov_open (const char *name) +{ + gcov_var.start = 0; + gcov_var.offset = gcov_var.length = 0; + gcov_var.overread = -1u; + gcov_var.error = 0; + gcov_var.file = gcov_current_file; + gcov_var.mode = 1; + + return 1; +} +#endif /* __KERNEL__ */ + + +/* Close the current gcov file. Flushes data to disk. Returns nonzero + on failure or error flag set. */ + +GCOV_LINKAGE int +gcov_close (void) +{ + if (gcov_var.file) + { +#if !IN_GCOV + if (gcov_var.offset && gcov_var.mode < 0) + gcov_write_block (gcov_var.offset); +#endif + _GCOV_fclose (gcov_var.file); + gcov_var.file = 0; + gcov_var.length = 0; + } +#if !IN_LIBGCOV + free (gcov_var.buffer); + gcov_var.alloc = 0; + gcov_var.buffer = 0; +#endif + gcov_var.mode = 0; + return gcov_var.error; +} + +#if !IN_LIBGCOV +/* Check if MAGIC is EXPECTED. Use it to determine endianness of the + file. Returns +1 for same endian, -1 for other endian and zero for + not EXPECTED. */ + +GCOV_LINKAGE int +gcov_magic (gcov_unsigned_t magic, gcov_unsigned_t expected) +{ + if (magic == expected) + return 1; + magic = (magic >> 16) | (magic << 16); + magic = ((magic & 0xff00ff) << 8) | ((magic >> 8) & 0xff00ff); + if (magic == expected) + { + gcov_var.endian = 1; + return -1; + } + return 0; +} +#endif + +#if !IN_LIBGCOV +static void +gcov_allocate (unsigned length) +{ + size_t new_size = gcov_var.alloc; + + if (!new_size) + new_size = GCOV_BLOCK_SIZE; + new_size += length; + new_size *= 2; + + gcov_var.alloc = new_size; + gcov_var.buffer = XRESIZEVAR (gcov_unsigned_t, gcov_var.buffer, new_size << 2); +} +#endif + +#if !IN_GCOV +/* Write out the current block, if needs be. */ + +static void +gcov_write_block (unsigned size) +{ + if (_GCOV_fwrite (gcov_var.buffer, size << 2, 1, gcov_var.file) != 1) + gcov_var.error = 1; + gcov_var.start += size; + gcov_var.offset -= size; +} + +/* Allocate space to write BYTES bytes to the gcov file. Return a + pointer to those bytes, or NULL on failure. */ + +static gcov_unsigned_t * +gcov_write_words (unsigned words) +{ + gcov_unsigned_t *result; + + gcc_assert (gcov_var.mode < 0); +#if IN_LIBGCOV + if (gcov_var.offset >= GCOV_BLOCK_SIZE) + { + gcov_write_block (GCOV_BLOCK_SIZE); + if (gcov_var.offset) + { + gcc_assert (gcov_var.offset == 1); + memcpy (gcov_var.buffer, gcov_var.buffer + GCOV_BLOCK_SIZE, 4); + } + } +#else + if (gcov_var.offset + words > gcov_var.alloc) + gcov_allocate (gcov_var.offset + words); +#endif + result = &gcov_var.buffer[gcov_var.offset]; + gcov_var.offset += words; + + return result; +} + +/* Write unsigned VALUE to coverage file. Sets error flag + appropriately. */ + +GCOV_LINKAGE void +gcov_write_unsigned (gcov_unsigned_t value) +{ + gcov_unsigned_t *buffer = gcov_write_words (1); + + buffer[0] = value; +} + +/* Compute the total length in words required to write NUM_STRINGS + in STRING_ARRAY as unsigned. */ + +GCOV_LINKAGE gcov_unsigned_t +gcov_compute_string_array_len (char **string_array, + gcov_unsigned_t num_strings) +{ + gcov_unsigned_t len = 0, i; + for (i = 0; i < num_strings; i++) + { + gcov_unsigned_t string_len + = (strlen (string_array[i]) + sizeof (gcov_unsigned_t)) + / sizeof (gcov_unsigned_t); + len += string_len; + len += 1; /* Each string is lead by a length. */ + } + return len; +} + +/* Write NUM_STRINGS in STRING_ARRAY as unsigned. */ + +GCOV_LINKAGE void +gcov_write_string_array (char **string_array, gcov_unsigned_t num_strings) +{ + gcov_unsigned_t i, j; + for (j = 0; j < num_strings; j++) + { + gcov_unsigned_t *aligned_string; + gcov_unsigned_t string_len = + (strlen (string_array[j]) + sizeof (gcov_unsigned_t)) / + sizeof (gcov_unsigned_t); + aligned_string = (gcov_unsigned_t *) + alloca ((string_len + 1) * sizeof (gcov_unsigned_t)); + memset (aligned_string, 0, (string_len + 1) * sizeof (gcov_unsigned_t)); + aligned_string[0] = string_len; + strcpy ((char*) (aligned_string + 1), string_array[j]); + for (i = 0; i < (string_len + 1); i++) + gcov_write_unsigned (aligned_string[i]); + } +} + +/* Write counter VALUE to coverage file. Sets error flag + appropriately. */ + +#if IN_LIBGCOV +GCOV_LINKAGE void +gcov_write_counter (gcov_type value) +{ + gcov_unsigned_t *buffer = gcov_write_words (2); + + buffer[0] = (gcov_unsigned_t) value; + if (sizeof (value) > sizeof (gcov_unsigned_t)) + buffer[1] = (gcov_unsigned_t) (value >> 32); + else + buffer[1] = 0; +} +#endif /* IN_LIBGCOV */ + +#if !IN_LIBGCOV +/* Write STRING to coverage file. Sets error flag on file + error, overflow flag on overflow */ + +GCOV_LINKAGE void +gcov_write_string (const char *string) +{ + unsigned length = 0; + unsigned alloc = 0; + gcov_unsigned_t *buffer; + + if (string) + { + length = strlen (string); + alloc = (length + 4) >> 2; + } + + buffer = gcov_write_words (1 + alloc); + + buffer[0] = alloc; + buffer[alloc] = 0; + memcpy (&buffer[1], string, length); +} +#endif + +#if !IN_LIBGCOV +/* Write a tag TAG and reserve space for the record length. Return a + value to be used for gcov_write_length. */ + +GCOV_LINKAGE gcov_position_t +gcov_write_tag (gcov_unsigned_t tag) +{ + gcov_position_t result = gcov_var.start + gcov_var.offset; + gcov_unsigned_t *buffer = gcov_write_words (2); + + buffer[0] = tag; + buffer[1] = 0; + + return result; +} + +/* Write a record length using POSITION, which was returned by + gcov_write_tag. The current file position is the end of the + record, and is restored before returning. Returns nonzero on + overflow. */ + +GCOV_LINKAGE void +gcov_write_length (gcov_position_t position) +{ + unsigned offset; + gcov_unsigned_t length; + gcov_unsigned_t *buffer; + + gcc_assert (gcov_var.mode < 0); + gcc_assert (position + 2 <= gcov_var.start + gcov_var.offset); + gcc_assert (position >= gcov_var.start); + offset = position - gcov_var.start; + length = gcov_var.offset - offset - 2; + buffer = (gcov_unsigned_t *) &gcov_var.buffer[offset]; + buffer[1] = length; + if (gcov_var.offset >= GCOV_BLOCK_SIZE) + gcov_write_block (gcov_var.offset); +} + +#else /* IN_LIBGCOV */ + +/* Write a tag TAG and length LENGTH. */ + +GCOV_LINKAGE void +gcov_write_tag_length (gcov_unsigned_t tag, gcov_unsigned_t length) +{ + gcov_unsigned_t *buffer = gcov_write_words (2); + + buffer[0] = tag; + buffer[1] = length; +} + +/* Write a summary structure to the gcov file. Return nonzero on + overflow. */ + +GCOV_LINKAGE void +gcov_write_summary (gcov_unsigned_t tag, const struct gcov_summary *summary) +{ + unsigned ix, h_ix, bv_ix, h_cnt = 0; + const struct gcov_ctr_summary *csum; + unsigned histo_bitvector[GCOV_HISTOGRAM_BITVECTOR_SIZE]; + + /* Count number of non-zero histogram entries, and fill in a bit vector + of non-zero indices. The histogram is only currently computed for arc + counters. */ + for (bv_ix = 0; bv_ix < GCOV_HISTOGRAM_BITVECTOR_SIZE; bv_ix++) + histo_bitvector[bv_ix] = 0; + csum = &summary->ctrs[GCOV_COUNTER_ARCS]; + for (h_ix = 0; h_ix < GCOV_HISTOGRAM_SIZE; h_ix++) + { + if (csum->histogram[h_ix].num_counters > 0) + { + histo_bitvector[h_ix / 32] |= 1 << (h_ix % 32); + h_cnt++; + } + } + gcov_write_tag_length (tag, GCOV_TAG_SUMMARY_LENGTH (h_cnt)); + gcov_write_unsigned (summary->checksum); + for (csum = summary->ctrs, ix = GCOV_COUNTERS_SUMMABLE; ix--; csum++) + { + gcov_write_unsigned (csum->num); + gcov_write_unsigned (csum->runs); + gcov_write_counter (csum->sum_all); + gcov_write_counter (csum->run_max); + gcov_write_counter (csum->sum_max); + if (ix != GCOV_COUNTER_ARCS) + { + for (bv_ix = 0; bv_ix < GCOV_HISTOGRAM_BITVECTOR_SIZE; bv_ix++) + gcov_write_unsigned (0); + continue; + } + for (bv_ix = 0; bv_ix < GCOV_HISTOGRAM_BITVECTOR_SIZE; bv_ix++) + gcov_write_unsigned (histo_bitvector[bv_ix]); + for (h_ix = 0; h_ix < GCOV_HISTOGRAM_SIZE; h_ix++) + { + if (!csum->histogram[h_ix].num_counters) + continue; + gcov_write_unsigned (csum->histogram[h_ix].num_counters); + gcov_write_counter (csum->histogram[h_ix].min_value); + gcov_write_counter (csum->histogram[h_ix].cum_value); + } + } +} +#endif /* IN_LIBGCOV */ + +#endif /*!IN_GCOV */ + +/* Return a pointer to read BYTES bytes from the gcov file. Returns + NULL on failure (read past EOF). */ + +static const gcov_unsigned_t * +gcov_read_words (unsigned words) +{ + const gcov_unsigned_t *result; + unsigned excess = gcov_var.length - gcov_var.offset; + + gcc_assert (gcov_var.mode > 0); + if (excess < words) + { + gcov_var.start += gcov_var.offset; +#if IN_LIBGCOV + if (excess) + { + gcc_assert (excess == 1); + memcpy (gcov_var.buffer, gcov_var.buffer + gcov_var.offset, 4); + } +#else + memmove (gcov_var.buffer, gcov_var.buffer + gcov_var.offset, excess * 4); +#endif + gcov_var.offset = 0; + gcov_var.length = excess; +#if IN_LIBGCOV + gcc_assert (!gcov_var.length || gcov_var.length == 1); + excess = GCOV_BLOCK_SIZE; +#else + if (gcov_var.length + words > gcov_var.alloc) + gcov_allocate (gcov_var.length + words); + excess = gcov_var.alloc - gcov_var.length; +#endif + excess = _GCOV_fread (gcov_var.buffer + gcov_var.length, + 1, excess << 2, gcov_var.file) >> 2; + gcov_var.length += excess; + if (gcov_var.length < words) + { + gcov_var.overread += words - gcov_var.length; + gcov_var.length = 0; + return 0; + } + } + result = &gcov_var.buffer[gcov_var.offset]; + gcov_var.offset += words; + return result; +} + +/* Read unsigned value from a coverage file. Sets error flag on file + error, overflow flag on overflow */ + +GCOV_LINKAGE gcov_unsigned_t +gcov_read_unsigned (void) +{ + gcov_unsigned_t value; + const gcov_unsigned_t *buffer = gcov_read_words (1); + + if (!buffer) + return 0; + value = from_file (buffer[0]); + return value; +} + +/* Read counter value from a coverage file. Sets error flag on file + error, overflow flag on overflow */ + +GCOV_LINKAGE gcov_type +gcov_read_counter (void) +{ + gcov_type value; + const gcov_unsigned_t *buffer = gcov_read_words (2); + + if (!buffer) + return 0; + value = from_file (buffer[0]); + if (sizeof (value) > sizeof (gcov_unsigned_t)) + value |= ((gcov_type) from_file (buffer[1])) << 32; + else if (buffer[1]) + gcov_var.error = -1; + + return value; +} + +/* We need to expose the below function when compiling for gcov-tool. */ + +#if !IN_LIBGCOV || defined (IN_GCOV_TOOL) +/* Read string from coverage file. Returns a pointer to a static + buffer, or NULL on empty string. You must copy the string before + calling another gcov function. */ + +GCOV_LINKAGE const char * +gcov_read_string (void) +{ + unsigned length = gcov_read_unsigned (); + + if (!length) + return 0; + + return (const char *) gcov_read_words (length); +} +#endif + +#ifdef __KERNEL__ +static int +k_popcountll (long long x) +{ + int c = 0; + while (x) + { + c++; + x &= (x-1); + } + return c; +} +#endif + +GCOV_LINKAGE void +gcov_read_summary (struct gcov_summary *summary) +{ + unsigned ix, h_ix, bv_ix, h_cnt = 0; + struct gcov_ctr_summary *csum; + unsigned histo_bitvector[GCOV_HISTOGRAM_BITVECTOR_SIZE]; + unsigned cur_bitvector; + + summary->checksum = gcov_read_unsigned (); + for (csum = summary->ctrs, ix = GCOV_COUNTERS_SUMMABLE; ix--; csum++) + { + csum->num = gcov_read_unsigned (); + csum->runs = gcov_read_unsigned (); + csum->sum_all = gcov_read_counter (); + csum->run_max = gcov_read_counter (); + csum->sum_max = gcov_read_counter (); + memset (csum->histogram, 0, + sizeof (gcov_bucket_type) * GCOV_HISTOGRAM_SIZE); + for (bv_ix = 0; bv_ix < GCOV_HISTOGRAM_BITVECTOR_SIZE; bv_ix++) + { + histo_bitvector[bv_ix] = gcov_read_unsigned (); +#if IN_LIBGCOV + /* When building libgcov we don't include system.h, which includes + hwint.h (where popcount_hwi is declared). However, libgcov.a + is built by the bootstrapped compiler and therefore the builtins + are always available. */ +#ifndef __KERNEL__ + h_cnt += __builtin_popcount (histo_bitvector[bv_ix]); +#else + h_cnt += k_popcountll (histo_bitvector[bv_ix]); +#endif +#else + h_cnt += popcount_hwi (histo_bitvector[bv_ix]); +#endif + } + bv_ix = 0; + h_ix = 0; + cur_bitvector = 0; + while (h_cnt--) + { + /* Find the index corresponding to the next entry we will read in. + First find the next non-zero bitvector and re-initialize + the histogram index accordingly, then right shift and increment + the index until we find a set bit. */ + while (!cur_bitvector) + { + h_ix = bv_ix * 32; + gcc_assert (bv_ix < GCOV_HISTOGRAM_BITVECTOR_SIZE); + cur_bitvector = histo_bitvector[bv_ix++]; + } + while (!(cur_bitvector & 0x1)) + { + h_ix++; + cur_bitvector >>= 1; + } + gcc_assert (h_ix < GCOV_HISTOGRAM_SIZE); + + csum->histogram[h_ix].num_counters = gcov_read_unsigned (); + csum->histogram[h_ix].min_value = gcov_read_counter (); + csum->histogram[h_ix].cum_value = gcov_read_counter (); + /* Shift off the index we are done with and increment to the + corresponding next histogram entry. */ + cur_bitvector >>= 1; + h_ix++; + } + } +} + +/* Read LENGTH words (unsigned type) from a zero profile fixup record with the + number of function flags saved in NUM_FNS. Returns the int flag array, which + should be deallocated by caller, or NULL on error. */ + +GCOV_LINKAGE int * +gcov_read_comdat_zero_fixup (gcov_unsigned_t length, + gcov_unsigned_t *num_fns) +{ +#ifndef __KERNEL__ + unsigned ix, f_ix; + gcov_unsigned_t num = gcov_read_unsigned (); + /* The length consists of 1 word to hold the number of functions, + plus enough 32-bit words to hold 1 bit/function. */ + gcc_assert ((num + 31) / 32 + 1 == length); + int *zero_fixup_flags = (int *) xcalloc (num, sizeof (int)); + for (ix = 0; ix < length - 1; ix++) + { + gcov_unsigned_t bitvector = gcov_read_unsigned (); + f_ix = ix * 32; + while (bitvector) + { + if (bitvector & 0x1) + zero_fixup_flags[f_ix] = 1; + f_ix++; + bitvector >>= 1; + } + } + *num_fns = num; + return zero_fixup_flags; +#else + return NULL; +#endif +} + +/* Read NUM_STRINGS strings (as an unsigned array) in STRING_ARRAY, and return + the number of words read. */ + +GCOV_LINKAGE gcov_unsigned_t +gcov_read_string_array (char **string_array, gcov_unsigned_t num_strings) +{ + gcov_unsigned_t i, j, len = 0; + + for (j = 0; j < num_strings; j++) + { + gcov_unsigned_t string_len = gcov_read_unsigned (); + string_array[j] = + (char *) xmalloc (string_len * sizeof (gcov_unsigned_t)); + for (i = 0; i < string_len; i++) + ((gcov_unsigned_t *) string_array[j])[i] = gcov_read_unsigned (); + len += (string_len + 1); + } + return len; +} + +/* Read LENGTH words (unsigned type) from a build info record with the number + of strings read saved in NUM_STRINGS. Returns the string array, which + should be deallocated by caller, or NULL on error. */ + +GCOV_LINKAGE char ** +gcov_read_build_info (gcov_unsigned_t length, gcov_unsigned_t *num_strings) +{ + gcov_unsigned_t num = gcov_read_unsigned (); + char **build_info_strings = (char **) + xmalloc (sizeof (char *) * num); + gcov_unsigned_t len = gcov_read_string_array (build_info_strings, + num); + if (len != length - 1) + return NULL; + *num_strings = num; + return build_info_strings; +} + +#if (!IN_LIBGCOV && IN_GCOV != 1) || defined (IN_GCOV_TOOL) +/* Read LEN words (unsigned type) and construct MOD_INFO. */ + +GCOV_LINKAGE void +gcov_read_module_info (struct gcov_module_info *mod_info, + gcov_unsigned_t len) +{ + gcov_unsigned_t src_filename_len, filename_len, i, num_strings; + mod_info->ident = gcov_read_unsigned (); + mod_info->is_primary = gcov_read_unsigned (); + mod_info->flags = gcov_read_unsigned (); + mod_info->lang = gcov_read_unsigned (); + mod_info->ggc_memory = gcov_read_unsigned (); + mod_info->num_quote_paths = gcov_read_unsigned (); + mod_info->num_bracket_paths = gcov_read_unsigned (); + mod_info->num_system_paths = gcov_read_unsigned (); + mod_info->num_cpp_defines = gcov_read_unsigned (); + mod_info->num_cpp_includes = gcov_read_unsigned (); + mod_info->num_cl_args = gcov_read_unsigned (); + len -= 11; + + filename_len = gcov_read_unsigned (); + mod_info->da_filename = (char *) xmalloc (filename_len * + sizeof (gcov_unsigned_t)); + for (i = 0; i < filename_len; i++) + ((gcov_unsigned_t *) mod_info->da_filename)[i] = gcov_read_unsigned (); + len -= (filename_len + 1); + + src_filename_len = gcov_read_unsigned (); + mod_info->source_filename = (char *) xmalloc (src_filename_len * + sizeof (gcov_unsigned_t)); + for (i = 0; i < src_filename_len; i++) + ((gcov_unsigned_t *) mod_info->source_filename)[i] = gcov_read_unsigned (); + len -= (src_filename_len + 1); + + num_strings = mod_info->num_quote_paths + mod_info->num_bracket_paths + + mod_info->num_system_paths + + mod_info->num_cpp_defines + mod_info->num_cpp_includes + + mod_info->num_cl_args; + len -= gcov_read_string_array (mod_info->string_array, num_strings); + gcc_assert (!len); +} +#endif + +/* We need to expose the below function when compiling for gcov-tool. */ + +#if !IN_LIBGCOV || defined (IN_GCOV_TOOL) +/* Reset to a known position. BASE should have been obtained from + gcov_position, LENGTH should be a record length. */ + +GCOV_LINKAGE void +gcov_sync (gcov_position_t base, gcov_unsigned_t length) +{ + gcc_assert (gcov_var.mode > 0); + base += length; + if (base - gcov_var.start <= gcov_var.length) + gcov_var.offset = base - gcov_var.start; + else + { + gcov_var.offset = gcov_var.length = 0; + _GCOV_fseek (gcov_var.file, base << 2, SEEK_SET); + gcov_var.start = _GCOV_ftell (gcov_var.file) >> 2; + } +} +#endif + +#if IN_LIBGCOV +/* Move to a given position in a gcov file. */ + +GCOV_LINKAGE void +gcov_seek (gcov_position_t base) +{ + gcc_assert (gcov_var.mode < 0); + if (gcov_var.offset) + gcov_write_block (gcov_var.offset); + _GCOV_fseek (gcov_var.file, base << 2, SEEK_SET); + gcov_var.start = _GCOV_ftell (gcov_var.file) >> 2; +} + +/* Truncate the gcov file at the current position. */ + +GCOV_LINKAGE void +gcov_truncate (void) +{ +#ifdef __KERNEL__ + gcc_assert (0); +#else + long offs; + int filenum; + gcc_assert (gcov_var.mode < 0); + if (gcov_var.offset) + gcov_write_block (gcov_var.offset); + offs = _GCOV_ftell (gcov_var.file); + filenum = fileno (gcov_var.file); + if (offs == -1 || filenum == -1 || _GCOV_ftruncate (filenum, offs)) + gcov_var.error = 1; +#endif /* __KERNEL__ */ +} +#endif + +#if IN_GCOV > 0 +/* Return the modification time of the current gcov file. */ + +GCOV_LINKAGE time_t +gcov_time (void) +{ + struct stat status; + + if (fstat (fileno (gcov_var.file), &status)) + return 0; + else + return status.st_mtime; +} +#endif /* IN_GCOV */ + +#if !IN_GCOV +/* Determine the index into histogram for VALUE. */ + +#if IN_LIBGCOV +static unsigned +#else +GCOV_LINKAGE unsigned +#endif +gcov_histo_index (gcov_type value) +{ + gcov_type_unsigned v = (gcov_type_unsigned)value; + unsigned r = 0; + unsigned prev2bits = 0; + + /* Find index into log2 scale histogram, where each of the log2 + sized buckets is divided into 4 linear sub-buckets for better + focus in the higher buckets. */ + + /* Find the place of the most-significant bit set. */ + if (v > 0) + { +#if IN_LIBGCOV + /* When building libgcov we don't include system.h, which includes + hwint.h (where floor_log2 is declared). However, libgcov.a + is built by the bootstrapped compiler and therefore the builtins + are always available. */ + r = sizeof (long long) * __CHAR_BIT__ - 1 - __builtin_clzll (v); +#else + /* We use floor_log2 from hwint.c, which takes a HOST_WIDE_INT + that is either 32 or 64 bits, and gcov_type_unsigned may be 64 bits. + Need to check for the case where gcov_type_unsigned is 64 bits + and HOST_WIDE_INT is 32 bits and handle it specially. */ +#if HOST_BITS_PER_WIDEST_INT == HOST_BITS_PER_WIDE_INT + r = floor_log2 (v); +#elif HOST_BITS_PER_WIDEST_INT == 2 * HOST_BITS_PER_WIDE_INT + HOST_WIDE_INT hwi_v = v >> HOST_BITS_PER_WIDE_INT; + if (hwi_v) + r = floor_log2 (hwi_v) + HOST_BITS_PER_WIDE_INT; + else + r = floor_log2 ((HOST_WIDE_INT)v); +#else + gcc_unreachable (); +#endif +#endif + } + + /* If at most the 2 least significant bits are set (value is + 0 - 3) then that value is our index into the lowest set of + four buckets. */ + if (r < 2) + return (unsigned)value; + + gcc_assert (r < 64); + + /* Find the two next most significant bits to determine which + of the four linear sub-buckets to select. */ + prev2bits = (v >> (r - 2)) & 0x3; + /* Finally, compose the final bucket index from the log2 index and + the next 2 bits. The minimum r value at this point is 2 since we + returned above if r was 2 or more, so the minimum bucket at this + point is 4. */ + return (r - 1) * 4 + prev2bits; +} + +/* Merge SRC_HISTO into TGT_HISTO. The counters are assumed to be in + the same relative order in both histograms, and are matched up + and merged in reverse order. Each counter is assigned an equal portion of + its entry's original cumulative counter value when computing the + new merged cum_value. */ + +static void gcov_histogram_merge (gcov_bucket_type *tgt_histo, + gcov_bucket_type *src_histo) +{ + int src_i, tgt_i, tmp_i = 0; + unsigned src_num, tgt_num, merge_num; + gcov_type src_cum, tgt_cum, merge_src_cum, merge_tgt_cum, merge_cum; + gcov_type merge_min; + gcov_bucket_type tmp_histo[GCOV_HISTOGRAM_SIZE]; + int src_done = 0; + + memset (tmp_histo, 0, sizeof (gcov_bucket_type) * GCOV_HISTOGRAM_SIZE); + + /* Assume that the counters are in the same relative order in both + histograms. Walk the histograms from largest to smallest entry, + matching up and combining counters in order. */ + src_num = 0; + src_cum = 0; + src_i = GCOV_HISTOGRAM_SIZE - 1; + for (tgt_i = GCOV_HISTOGRAM_SIZE - 1; tgt_i >= 0 && !src_done; tgt_i--) + { + tgt_num = tgt_histo[tgt_i].num_counters; + tgt_cum = tgt_histo[tgt_i].cum_value; + /* Keep going until all of the target histogram's counters at this + position have been matched and merged with counters from the + source histogram. */ + while (tgt_num > 0 && !src_done) + { + /* If this is either the first time through this loop or we just + exhausted the previous non-zero source histogram entry, look + for the next non-zero source histogram entry. */ + if (!src_num) + { + /* Locate the next non-zero entry. */ + while (src_i >= 0 && !src_histo[src_i].num_counters) + src_i--; + /* If source histogram has fewer counters, then just copy over the + remaining target counters and quit. */ + if (src_i < 0) + { + tmp_histo[tgt_i].num_counters += tgt_num; + tmp_histo[tgt_i].cum_value += tgt_cum; + if (!tmp_histo[tgt_i].min_value || + tgt_histo[tgt_i].min_value < tmp_histo[tgt_i].min_value) + tmp_histo[tgt_i].min_value = tgt_histo[tgt_i].min_value; + while (--tgt_i >= 0) + { + tmp_histo[tgt_i].num_counters + += tgt_histo[tgt_i].num_counters; + tmp_histo[tgt_i].cum_value += tgt_histo[tgt_i].cum_value; + if (!tmp_histo[tgt_i].min_value || + tgt_histo[tgt_i].min_value + < tmp_histo[tgt_i].min_value) + tmp_histo[tgt_i].min_value = tgt_histo[tgt_i].min_value; + } + + src_done = 1; + break; + } + + src_num = src_histo[src_i].num_counters; + src_cum = src_histo[src_i].cum_value; + } + + /* The number of counters to merge on this pass is the minimum + of the remaining counters from the current target and source + histogram entries. */ + merge_num = tgt_num; + if (src_num < merge_num) + merge_num = src_num; + + /* The merged min_value is the sum of the min_values from target + and source. */ + merge_min = tgt_histo[tgt_i].min_value + src_histo[src_i].min_value; + + /* Compute the portion of source and target entries' cum_value + that will be apportioned to the counters being merged. + The total remaining cum_value from each entry is divided + equally among the counters from that histogram entry if we + are not merging all of them. */ + merge_src_cum = src_cum; + if (merge_num < src_num) + merge_src_cum = merge_num * src_cum / src_num; + merge_tgt_cum = tgt_cum; + if (merge_num < tgt_num) + merge_tgt_cum = merge_num * tgt_cum / tgt_num; + /* The merged cum_value is the sum of the source and target + components. */ + merge_cum = merge_src_cum + merge_tgt_cum; + + /* Update the remaining number of counters and cum_value left + to be merged from this source and target entry. */ + src_cum -= merge_src_cum; + tgt_cum -= merge_tgt_cum; + src_num -= merge_num; + tgt_num -= merge_num; + + /* The merged counters get placed in the new merged histogram + at the entry for the merged min_value. */ + tmp_i = gcov_histo_index (merge_min); + gcc_assert (tmp_i < GCOV_HISTOGRAM_SIZE); + tmp_histo[tmp_i].num_counters += merge_num; + tmp_histo[tmp_i].cum_value += merge_cum; + if (!tmp_histo[tmp_i].min_value || + merge_min < tmp_histo[tmp_i].min_value) + tmp_histo[tmp_i].min_value = merge_min; + + /* Ensure the search for the next non-zero src_histo entry starts + at the next smallest histogram bucket. */ + if (!src_num) + src_i--; + } + } + + gcc_assert (tgt_i < 0); + + /* In the case where there were more counters in the source histogram, + accumulate the remaining unmerged cumulative counter values. Add + those to the smallest non-zero target histogram entry. Otherwise, + the total cumulative counter values in the histogram will be smaller + than the sum_all stored in the summary, which will complicate + computing the working set information from the histogram later on. */ + if (src_num) + src_i--; + while (src_i >= 0) + { + src_cum += src_histo[src_i].cum_value; + src_i--; + } + /* At this point, tmp_i should be the smallest non-zero entry in the + tmp_histo. */ + gcc_assert (tmp_i >= 0 && tmp_i < GCOV_HISTOGRAM_SIZE + && tmp_histo[tmp_i].num_counters > 0); + tmp_histo[tmp_i].cum_value += src_cum; + + /* Finally, copy the merged histogram into tgt_histo. */ + memcpy (tgt_histo, tmp_histo, + sizeof (gcov_bucket_type) * GCOV_HISTOGRAM_SIZE); +} +#endif /* !IN_GCOV */ + +/* This is used by gcov-dump (IN_GCOV == -1) and in the compiler + (!IN_GCOV && !IN_LIBGCOV). */ +#if IN_GCOV <= 0 && !IN_LIBGCOV +/* Compute the working set information from the counter histogram in + the profile summary. This is an array of information corresponding to a + range of percentages of the total execution count (sum_all), and includes + the number of counters required to cover that working set percentage and + the minimum counter value in that working set. */ + +GCOV_LINKAGE void +compute_working_sets (const struct gcov_ctr_summary *summary, + gcov_working_set_t *gcov_working_sets) +{ + gcov_type working_set_cum_values[NUM_GCOV_WORKING_SETS]; + gcov_type ws_cum_hotness_incr; + gcov_type cum, tmp_cum; + const gcov_bucket_type *histo_bucket; + unsigned ws_ix, c_num, count; + int h_ix; + + /* Compute the amount of sum_all that the cumulative hotness grows + by in each successive working set entry, which depends on the + number of working set entries. */ + ws_cum_hotness_incr = summary->sum_all / NUM_GCOV_WORKING_SETS; + + /* Next fill in an array of the cumulative hotness values corresponding + to each working set summary entry we are going to compute below. + Skip 0% statistics, which can be extrapolated from the + rest of the summary data. */ + cum = ws_cum_hotness_incr; + for (ws_ix = 0; ws_ix < NUM_GCOV_WORKING_SETS; + ws_ix++, cum += ws_cum_hotness_incr) + working_set_cum_values[ws_ix] = cum; + /* The last summary entry is reserved for (roughly) 99.9% of the + working set. Divide by 1024 so it becomes a shift, which gives + almost exactly 99.9%. */ + working_set_cum_values[NUM_GCOV_WORKING_SETS-1] + = summary->sum_all - summary->sum_all/1024; + + /* Next, walk through the histogram in decending order of hotness + and compute the statistics for the working set summary array. + As histogram entries are accumulated, we check to see which + working set entries have had their expected cum_value reached + and fill them in, walking the working set entries in increasing + size of cum_value. */ + ws_ix = 0; /* The current entry into the working set array. */ + cum = 0; /* The current accumulated counter sum. */ + count = 0; /* The current accumulated count of block counters. */ + for (h_ix = GCOV_HISTOGRAM_SIZE - 1; + h_ix >= 0 && ws_ix < NUM_GCOV_WORKING_SETS; h_ix--) + { + histo_bucket = &summary->histogram[h_ix]; + + /* If we haven't reached the required cumulative counter value for + the current working set percentage, simply accumulate this histogram + entry into the running sums and continue to the next histogram + entry. */ + if (cum + histo_bucket->cum_value < working_set_cum_values[ws_ix]) + { + cum += histo_bucket->cum_value; + count += histo_bucket->num_counters; + continue; + } + + /* If adding the current histogram entry's cumulative counter value + causes us to exceed the current working set size, then estimate + how many of this histogram entry's counter values are required to + reach the working set size, and fill in working set entries + as we reach their expected cumulative value. */ + for (c_num = 0, tmp_cum = cum; + c_num < histo_bucket->num_counters && ws_ix < NUM_GCOV_WORKING_SETS; + c_num++) + { + count++; + /* If we haven't reached the last histogram entry counter, add + in the minimum value again. This will underestimate the + cumulative sum so far, because many of the counter values in this + entry may have been larger than the minimum. We could add in the + average value every time, but that would require an expensive + divide operation. */ + if (c_num + 1 < histo_bucket->num_counters) + tmp_cum += histo_bucket->min_value; + /* If we have reached the last histogram entry counter, then add + in the entire cumulative value. */ + else + tmp_cum = cum + histo_bucket->cum_value; + + /* Next walk through successive working set entries and fill in + the statistics for any whose size we have reached by accumulating + this histogram counter. */ + while (ws_ix < NUM_GCOV_WORKING_SETS + && tmp_cum >= working_set_cum_values[ws_ix]) + { + gcov_working_sets[ws_ix].num_counters = count; + gcov_working_sets[ws_ix].min_counter + = histo_bucket->min_value; + ws_ix++; + } + } + /* Finally, update the running cumulative value since we were + using a temporary above. */ + cum += histo_bucket->cum_value; + } + gcc_assert (ws_ix == NUM_GCOV_WORKING_SETS); +} +#endif /* IN_GCOV <= 0 && !IN_LIBGCOV */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/gcov-io.h b/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/gcov-io.h new file mode 100644 index 0000000..895ff98 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/gcov-io.h @@ -0,0 +1,531 @@ +/* File format for coverage information + Copyright (C) 1996-2014 Free Software Foundation, Inc. + Contributed by Bob Manson . + Completely remangled by Nathan Sidwell . + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +. */ + + +/* Coverage information is held in two files. A notes file, which is + generated by the compiler, and a data file, which is generated by + the program under test. Both files use a similar structure. We do + not attempt to make these files backwards compatible with previous + versions, as you only need coverage information when developing a + program. We do hold version information, so that mismatches can be + detected, and we use a format that allows tools to skip information + they do not understand or are not interested in. + + Numbers are recorded in the 32 bit unsigned binary form of the + endianness of the machine generating the file. 64 bit numbers are + stored as two 32 bit numbers, the low part first. Strings are + padded with 1 to 4 NUL bytes, to bring the length up to a multiple + of 4. The number of 4 bytes is stored, followed by the padded + string. Zero length and NULL strings are simply stored as a length + of zero (they have no trailing NUL or padding). + + int32: byte3 byte2 byte1 byte0 | byte0 byte1 byte2 byte3 + int64: int32:low int32:high + string: int32:0 | int32:length char* char:0 padding + padding: | char:0 | char:0 char:0 | char:0 char:0 char:0 + item: int32 | int64 | string + + The basic format of the files is + + file : int32:magic int32:version int32:stamp record* + + The magic ident is different for the notes and the data files. The + magic ident is used to determine the endianness of the file, when + reading. The version is the same for both files and is derived + from gcc's version number. The stamp value is used to synchronize + note and data files and to synchronize merging within a data + file. It need not be an absolute time stamp, merely a ticker that + increments fast enough and cycles slow enough to distinguish + different compile/run/compile cycles. + + Although the ident and version are formally 32 bit numbers, they + are derived from 4 character ASCII strings. The version number + consists of the single character major version number, a two + character minor version number (leading zero for versions less than + 10), and a single character indicating the status of the release. + That will be 'e' experimental, 'p' prerelease and 'r' for release. + Because, by good fortune, these are in alphabetical order, string + collating can be used to compare version strings. Be aware that + the 'e' designation will (naturally) be unstable and might be + incompatible with itself. For gcc 3.4 experimental, it would be + '304e' (0x33303465). When the major version reaches 10, the + letters A-Z will be used. Assuming minor increments releases every + 6 months, we have to make a major increment every 50 years. + Assuming major increments releases every 5 years, we're ok for the + next 155 years -- good enough for me. + + A record has a tag, length and variable amount of data. + + record: header data + header: int32:tag int32:length + data: item* + + Records are not nested, but there is a record hierarchy. Tag + numbers reflect this hierarchy. Tags are unique across note and + data files. Some record types have a varying amount of data. The + LENGTH is the number of 4bytes that follow and is usually used to + determine how much data. The tag value is split into 4 8-bit + fields, one for each of four possible levels. The most significant + is allocated first. Unused levels are zero. Active levels are + odd-valued, so that the LSB of the level is one. A sub-level + incorporates the values of its superlevels. This formatting allows + you to determine the tag hierarchy, without understanding the tags + themselves, and is similar to the standard section numbering used + in technical documents. Level values [1..3f] are used for common + tags, values [41..9f] for the notes file and [a1..ff] for the data + file. + + The notes file contains the following records + note: unit function-graph* + unit: header int32:checksum string:source + function-graph: announce_function basic_blocks {arcs | lines}* + announce_function: header int32:ident + int32:lineno_checksum int32:cfg_checksum + string:name string:source int32:lineno + basic_block: header int32:flags* + arcs: header int32:block_no arc* + arc: int32:dest_block int32:flags + lines: header int32:block_no line* + int32:0 string:NULL + line: int32:line_no | int32:0 string:filename + + The BASIC_BLOCK record holds per-bb flags. The number of blocks + can be inferred from its data length. There is one ARCS record per + basic block. The number of arcs from a bb is implicit from the + data length. It enumerates the destination bb and per-arc flags. + There is one LINES record per basic block, it enumerates the source + lines which belong to that basic block. Source file names are + introduced by a line number of 0, following lines are from the new + source file. The initial source file for the function is NULL, but + the current source file should be remembered from one LINES record + to the next. The end of a block is indicated by an empty filename + - this does not reset the current source file. Note there is no + ordering of the ARCS and LINES records: they may be in any order, + interleaved in any manner. The current filename follows the order + the LINES records are stored in the file, *not* the ordering of the + blocks they are for. + + The data file contains the following records. + data: {unit summary:program* build_info zero_fixup function-data*}* + unit: header int32:checksum + function-data: announce_function present counts + announce_function: header int32:ident + int32:lineno_checksum int32:cfg_checksum + present: header int32:present + counts: header int64:count* + summary: int32:checksum {count-summary}GCOV_COUNTERS_SUMMABLE + count-summary: int32:num int32:runs int64:sum + int64:max int64:sum_max histogram + histogram: {int32:bitvector}8 histogram-buckets* + histogram-buckets: int32:num int64:min int64:sum + build_info: string:info* + zero_fixup: int32:num int32:bitvector* + + The ANNOUNCE_FUNCTION record is the same as that in the note file, + but without the source location. The COUNTS gives the + counter values for instrumented features. The about the whole + program. The checksum is used for whole program summaries, and + disambiguates different programs which include the same + instrumented object file. There may be several program summaries, + each with a unique checksum. The object summary's checksum is + zero. Note that the data file might contain information from + several runs concatenated, or the data might be merged. + + BUILD_INFO record contains a list of strings that is used + to include in the data file information about the profile generate + build. For example, it can be used to include source revision + information that is useful in diagnosing profile mis-matches. + + ZERO_FIXUP record contains a count of functions in the gcda file + and an array of bitvectors indexed by the function index's in the + function-data section. Each bit flags whether the function was a + COMDAT that had all-zero profiles that was fixed up by dyn-ipa + using profiles from functions with matching checksums in other modules. + + This file is included by both the compiler, gcov tools and the + runtime support library libgcov. IN_LIBGCOV and IN_GCOV are used to + distinguish which case is which. If IN_LIBGCOV is nonzero, + libgcov is being built. If IN_GCOV is nonzero, the gcov tools are + being built. Otherwise the compiler is being built. IN_GCOV may be + positive or negative. If positive, we are compiling a tool that + requires additional functions (see the code for knowledge of what + those functions are). */ + +#ifndef GCC_GCOV_IO_H +#define GCC_GCOV_IO_H + +#ifndef __KERNEL__ +# define _GCOV_FILE FILE +# define _GCOV_fclose fclose +# define _GCOV_ftell ftell +# define _GCOV_fseek fseek +# define _GCOV_ftruncate ftruncate +# define _GCOV_fread fread +# define _GCOV_fwrite fwrite +# define _GCOV_fread fread +# define _GCOV_fileno fileno +# define _GCOV_fopen fopen +#endif + +#ifndef IN_LIBGCOV +/* About the host */ + +typedef unsigned gcov_unsigned_t; +typedef unsigned gcov_position_t; + +#if LONG_LONG_TYPE_SIZE > 32 +#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_8 +#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_8 +#else +#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_4 +#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_4 +#endif +#define PROFILE_GEN_EDGE_ATOMIC (flag_profile_gen_atomic == 1 || \ + flag_profile_gen_atomic == 3) +#define PROFILE_GEN_VALUE_ATOMIC (flag_profile_gen_atomic == 2 || \ + flag_profile_gen_atomic == 3) + +/* gcov_type is typedef'd elsewhere for the compiler */ +#if IN_GCOV +#define GCOV_LINKAGE static +typedef HOST_WIDEST_INT gcov_type; +typedef unsigned HOST_WIDEST_INT gcov_type_unsigned; +#if IN_GCOV > 0 +#include +#endif + +#define FUNC_ID_WIDTH HOST_BITS_PER_WIDE_INT/2 +#define FUNC_ID_MASK ((1L << FUNC_ID_WIDTH) - 1) +#define EXTRACT_MODULE_ID_FROM_GLOBAL_ID(gid) (unsigned)(((gid) >> FUNC_ID_WIDTH) & FUNC_ID_MASK) +#define EXTRACT_FUNC_ID_FROM_GLOBAL_ID(gid) (unsigned)((gid) & FUNC_ID_MASK) +#define FUNC_GLOBAL_ID(m,f) ((((HOST_WIDE_INT) (m)) << FUNC_ID_WIDTH) | (f) + +#else /*!IN_GCOV */ +#define GCOV_TYPE_SIZE (LONG_LONG_TYPE_SIZE > 32 ? 64 : 32) +#endif + +#if defined (HOST_HAS_F_SETLKW) +#define GCOV_LOCKED 1 +#else +#define GCOV_LOCKED 0 +#endif + +#define ATTRIBUTE_HIDDEN + +#endif /* !IN_LIBGOCV */ + +#ifndef GCOV_LINKAGE +#define GCOV_LINKAGE extern +#endif + +/* File suffixes. */ +#define GCOV_DATA_SUFFIX ".gcda" +#define GCOV_NOTE_SUFFIX ".gcno" + +/* File magic. Must not be palindromes. */ +#define GCOV_DATA_MAGIC ((gcov_unsigned_t)0x67636461) /* "gcda" */ +#define GCOV_NOTE_MAGIC ((gcov_unsigned_t)0x67636e6f) /* "gcno" */ + +/* gcov-iov.h is automatically generated by the makefile from + version.c, it looks like + #define GCOV_VERSION ((gcov_unsigned_t)0x89abcdef) +*/ +#include "gcov-iov.h" + +/* Convert a magic or version number to a 4 character string. */ +#define GCOV_UNSIGNED2STRING(ARRAY,VALUE) \ + ((ARRAY)[0] = (char)((VALUE) >> 24), \ + (ARRAY)[1] = (char)((VALUE) >> 16), \ + (ARRAY)[2] = (char)((VALUE) >> 8), \ + (ARRAY)[3] = (char)((VALUE) >> 0)) + +/* The record tags. Values [1..3f] are for tags which may be in either + file. Values [41..9f] for those in the note file and [a1..ff] for + the data file. The tag value zero is used as an explicit end of + file marker -- it is not required to be present. */ + +#define GCOV_TAG_FUNCTION ((gcov_unsigned_t)0x01000000) +#define GCOV_TAG_FUNCTION_LENGTH (3) +#define GCOV_TAG_BLOCKS ((gcov_unsigned_t)0x01410000) +#define GCOV_TAG_BLOCKS_LENGTH(NUM) (NUM) +#define GCOV_TAG_BLOCKS_NUM(LENGTH) (LENGTH) +#define GCOV_TAG_ARCS ((gcov_unsigned_t)0x01430000) +#define GCOV_TAG_ARCS_LENGTH(NUM) (1 + (NUM) * 2) +#define GCOV_TAG_ARCS_NUM(LENGTH) (((LENGTH) - 1) / 2) +#define GCOV_TAG_LINES ((gcov_unsigned_t)0x01450000) +#define GCOV_TAG_COUNTER_BASE ((gcov_unsigned_t)0x01a10000) +#define GCOV_TAG_COUNTER_LENGTH(NUM) ((NUM) * 2) +#define GCOV_TAG_COUNTER_NUM(LENGTH) ((LENGTH) / 2) +#define GCOV_TAG_OBJECT_SUMMARY ((gcov_unsigned_t)0xa1000000) /* Obsolete */ +#define GCOV_TAG_PROGRAM_SUMMARY ((gcov_unsigned_t)0xa3000000) +#define GCOV_TAG_COMDAT_ZERO_FIXUP ((gcov_unsigned_t)0xa9000000) +/* Ceiling divide by 32 bit word size, plus one word to hold NUM. */ +#define GCOV_TAG_COMDAT_ZERO_FIXUP_LENGTH(NUM) (1 + (NUM + 31) / 32) +#define GCOV_TAG_SUMMARY_LENGTH(NUM) \ + (1 + GCOV_COUNTERS_SUMMABLE * (10 + 3 * 2) + (NUM) * 5) +#define GCOV_TAG_BUILD_INFO ((gcov_unsigned_t)0xa7000000) +#define GCOV_TAG_MODULE_INFO ((gcov_unsigned_t)0xab000000) +#define GCOV_TAG_AFDO_FILE_NAMES ((gcov_unsigned_t)0xaa000000) +#define GCOV_TAG_AFDO_FUNCTION ((gcov_unsigned_t)0xac000000) +#define GCOV_TAG_AFDO_MODULE_GROUPING ((gcov_unsigned_t)0xae000000) +#define GCOV_TAG_AFDO_WORKING_SET ((gcov_unsigned_t)0xaf000000) + +/* Counters that are collected. */ +#define DEF_GCOV_COUNTER(COUNTER, NAME, MERGE_FN) COUNTER, +enum { +#include "gcov-counter.def" +GCOV_COUNTERS +}; +#undef DEF_GCOV_COUNTER + +/* Counters which can be summaried. */ +#define GCOV_COUNTERS_SUMMABLE (GCOV_COUNTER_ARCS + 1) + +/* The first of counters used for value profiling. They must form a + consecutive interval and their order must match the order of + HIST_TYPEs in value-prof.h. */ +#define GCOV_FIRST_VALUE_COUNTER GCOV_COUNTERS_SUMMABLE + +/* The last of counters used for value profiling. */ +#define GCOV_LAST_VALUE_COUNTER (GCOV_COUNTERS - 2) + +/* Number of counters used for value profiling. */ +#define GCOV_N_VALUE_COUNTERS \ + (GCOV_LAST_VALUE_COUNTER - GCOV_FIRST_VALUE_COUNTER + 1) + +#define GCOV_ICALL_TOPN_VAL 2 /* Track two hottest callees */ +#define GCOV_ICALL_TOPN_NCOUNTS 9 /* The number of counter entries per icall callsite */ + +/* Convert a counter index to a tag. */ +#define GCOV_TAG_FOR_COUNTER(COUNT) \ + (GCOV_TAG_COUNTER_BASE + ((gcov_unsigned_t)(COUNT) << 17)) +/* Convert a tag to a counter. */ +#define GCOV_COUNTER_FOR_TAG(TAG) \ + ((unsigned)(((TAG) - GCOV_TAG_COUNTER_BASE) >> 17)) +/* Check whether a tag is a counter tag. */ +#define GCOV_TAG_IS_COUNTER(TAG) \ + (!((TAG) & 0xFFFF) && GCOV_COUNTER_FOR_TAG (TAG) < GCOV_COUNTERS) + +/* The tag level mask has 1's in the position of the inner levels, & + the lsb of the current level, and zero on the current and outer + levels. */ +#define GCOV_TAG_MASK(TAG) (((TAG) - 1) ^ (TAG)) + +/* Return nonzero if SUB is an immediate subtag of TAG. */ +#define GCOV_TAG_IS_SUBTAG(TAG,SUB) \ + (GCOV_TAG_MASK (TAG) >> 8 == GCOV_TAG_MASK (SUB) \ + && !(((SUB) ^ (TAG)) & ~GCOV_TAG_MASK (TAG))) + +/* Return nonzero if SUB is at a sublevel to TAG. */ +#define GCOV_TAG_IS_SUBLEVEL(TAG,SUB) \ + (GCOV_TAG_MASK (TAG) > GCOV_TAG_MASK (SUB)) + +/* Basic block flags. */ +#define GCOV_BLOCK_UNEXPECTED (1 << 1) + +/* Arc flags. */ +#define GCOV_ARC_ON_TREE (1 << 0) +#define GCOV_ARC_FAKE (1 << 1) +#define GCOV_ARC_FALLTHROUGH (1 << 2) + +/* Structured records. */ + +/* Structure used for each bucket of the log2 histogram of counter values. */ +typedef struct +{ + /* Number of counters whose profile count falls within the bucket. */ + gcov_unsigned_t num_counters; + /* Smallest profile count included in this bucket. */ + gcov_type min_value; + /* Cumulative value of the profile counts in this bucket. */ + gcov_type cum_value; +} gcov_bucket_type; + +/* For a log2 scale histogram with each range split into 4 + linear sub-ranges, there will be at most 64 (max gcov_type bit size) - 1 log2 + ranges since the lowest 2 log2 values share the lowest 4 linear + sub-range (values 0 - 3). This is 252 total entries (63*4). */ + +#define GCOV_HISTOGRAM_SIZE 252 + +/* How many unsigned ints are required to hold a bit vector of non-zero + histogram entries when the histogram is written to the gcov file. + This is essentially a ceiling divide by 32 bits. */ +#define GCOV_HISTOGRAM_BITVECTOR_SIZE (GCOV_HISTOGRAM_SIZE + 31) / 32 + +/* Cumulative counter data. */ +struct gcov_ctr_summary +{ + gcov_unsigned_t num; /* number of counters. */ + gcov_unsigned_t runs; /* number of program runs */ + gcov_type sum_all; /* sum of all counters accumulated. */ + gcov_type run_max; /* maximum value on a single run. */ + gcov_type sum_max; /* sum of individual run max values. */ + gcov_bucket_type histogram[GCOV_HISTOGRAM_SIZE]; /* histogram of + counter values. */ +}; + +/* Object & program summary record. */ +struct gcov_summary +{ + gcov_unsigned_t checksum; /* checksum of program */ + struct gcov_ctr_summary ctrs[GCOV_COUNTERS_SUMMABLE]; +}; + +#define GCOV_MODULE_UNKNOWN_LANG 0 +#define GCOV_MODULE_C_LANG 1 +#define GCOV_MODULE_CPP_LANG 2 +#define GCOV_MODULE_FORT_LANG 3 + +#define GCOV_MODULE_ASM_STMTS (1 << 16) +#define GCOV_MODULE_LANG_MASK 0xffff + +/* Source module info. The data structure is used in + both runtime and profile-use phase. Make sure to allocate + enough space for the variable length member. */ +struct gcov_module_info +{ + gcov_unsigned_t ident; + gcov_unsigned_t is_primary; /* this is overloaded to mean two things: + (1) means FDO/LIPO in instrumented binary. + (2) means IS_PRIMARY in persistent file or + memory copy used in profile-use. */ + gcov_unsigned_t flags; /* bit 0: is_exported, + bit 1: need to include all the auxiliary + modules in use compilation. */ + gcov_unsigned_t lang; /* lower 16 bits encode the language, and the upper + 16 bits enocde other attributes, such as whether + any assembler is present in the source, etc. */ + gcov_unsigned_t ggc_memory; /* memory needed for parsing in kb */ + char *da_filename; + char *source_filename; + gcov_unsigned_t num_quote_paths; + gcov_unsigned_t num_bracket_paths; + gcov_unsigned_t num_system_paths; + gcov_unsigned_t num_cpp_defines; + gcov_unsigned_t num_cpp_includes; + gcov_unsigned_t num_cl_args; + char *string_array[1]; +}; + +extern struct gcov_module_info **module_infos; +extern unsigned primary_module_id; +#define SET_MODULE_INCLUDE_ALL_AUX(modu) ((modu->flags |= 0x2)) +#define MODULE_INCLUDE_ALL_AUX_FLAG(modu) ((modu->flags & 0x2)) +#define SET_MODULE_EXPORTED(modu) ((modu->flags |= 0x1)) +#define MODULE_EXPORTED_FLAG(modu) ((modu->flags & 0x1)) +#define PRIMARY_MODULE_EXPORTED \ + (MODULE_EXPORTED_FLAG (module_infos[0]) \ + && !((module_infos[0]->lang & GCOV_MODULE_ASM_STMTS) \ + && flag_ripa_disallow_asm_modules)) + +#if !defined(inhibit_libc) + +/* Functions for reading and writing gcov files. In libgcov you can + open the file for reading then writing. Elsewhere you can open the + file either for reading or for writing. When reading a file you may + use the gcov_read_* functions, gcov_sync, gcov_position, & + gcov_error. When writing a file you may use the gcov_write + functions, gcov_seek & gcov_error. When a file is to be rewritten + you use the functions for reading, then gcov_rewrite then the + functions for writing. Your file may become corrupted if you break + these invariants. */ + +#if !IN_LIBGCOV +GCOV_LINKAGE int gcov_open (const char */*name*/, int /*direction*/); +GCOV_LINKAGE int gcov_magic (gcov_unsigned_t, gcov_unsigned_t); +#endif + +/* Available everywhere. */ +GCOV_LINKAGE int gcov_close (void) ATTRIBUTE_HIDDEN; +GCOV_LINKAGE gcov_unsigned_t gcov_read_unsigned (void) ATTRIBUTE_HIDDEN; +GCOV_LINKAGE gcov_type gcov_read_counter (void) ATTRIBUTE_HIDDEN; +GCOV_LINKAGE void gcov_read_summary (struct gcov_summary *) ATTRIBUTE_HIDDEN; +GCOV_LINKAGE int *gcov_read_comdat_zero_fixup (gcov_unsigned_t, + gcov_unsigned_t *) + ATTRIBUTE_HIDDEN; +GCOV_LINKAGE char **gcov_read_build_info (gcov_unsigned_t, gcov_unsigned_t *) + ATTRIBUTE_HIDDEN; +GCOV_LINKAGE const char *gcov_read_string (void); +GCOV_LINKAGE void gcov_sync (gcov_position_t /*base*/, + gcov_unsigned_t /*length */); +GCOV_LINKAGE gcov_unsigned_t gcov_read_string_array (char **, gcov_unsigned_t) + ATTRIBUTE_HIDDEN; + + +#if !IN_LIBGCOV && IN_GCOV != 1 +GCOV_LINKAGE void gcov_read_module_info (struct gcov_module_info *mod_info, + gcov_unsigned_t len) ATTRIBUTE_HIDDEN; +#endif + +#if !IN_GCOV +/* Available outside gcov */ +GCOV_LINKAGE void gcov_write_unsigned (gcov_unsigned_t) ATTRIBUTE_HIDDEN; +GCOV_LINKAGE gcov_unsigned_t gcov_compute_string_array_len (char **, + gcov_unsigned_t) + ATTRIBUTE_HIDDEN; +GCOV_LINKAGE void gcov_write_string_array (char **, gcov_unsigned_t) + ATTRIBUTE_HIDDEN; +#endif + +#if !IN_GCOV && !IN_LIBGCOV +/* Available only in compiler */ +GCOV_LINKAGE unsigned gcov_histo_index (gcov_type value); +GCOV_LINKAGE void gcov_write_string (const char *); +GCOV_LINKAGE gcov_position_t gcov_write_tag (gcov_unsigned_t); +GCOV_LINKAGE void gcov_write_length (gcov_position_t /*position*/); +#endif + +#if IN_GCOV <= 0 && !IN_LIBGCOV +/* Available in gcov-dump and the compiler. */ + +/* Number of data points in the working set summary array. Using 128 + provides information for at least every 1% increment of the total + profile size. The last entry is hardwired to 99.9% of the total. */ +#define NUM_GCOV_WORKING_SETS 128 + +/* Working set size statistics for a given percentage of the entire + profile (sum_all from the counter summary). */ +typedef struct gcov_working_set_info +{ + /* Number of hot counters included in this working set. */ + unsigned num_counters; + /* Smallest counter included in this working set. */ + gcov_type min_counter; +} gcov_working_set_t; + +GCOV_LINKAGE void compute_working_sets (const struct gcov_ctr_summary *summary, + gcov_working_set_t *gcov_working_sets); +#endif + +#if IN_GCOV > 0 +/* Available in gcov */ +GCOV_LINKAGE time_t gcov_time (void); +#endif + +#endif /* !inhibit_libc */ + +#endif /* GCC_GCOV_IO_H */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/gcov-iov.h b/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/gcov-iov.h new file mode 100644 index 0000000..0da8e1d --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/gcov-iov.h @@ -0,0 +1,4 @@ +/* Generated automatically by the program `build/gcov-iov' + from `4.9.x (4 9) and prerelease (*)'. */ + +#define GCOV_VERSION ((gcov_unsigned_t)0x3430392a) /* 409* */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/libgcov-driver-kernel.c b/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/libgcov-driver-kernel.c new file mode 100644 index 0000000..34298ed --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/libgcov-driver-kernel.c @@ -0,0 +1,203 @@ +/* Routines required for instrumenting a program. */ +/* Compile this one with gcc. */ +/* Copyright (C) 1989-2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +. */ + + +/* A utility function for outputing errors. */ + +static int __attribute__((format(printf, 1, 2))) +gcov_error (const char *fmt, ...) +{ + int ret; + va_list argp; + va_start (argp, fmt); + ret = vprintk (fmt, argp); + va_end (argp); + return ret; +} + +static void +allocate_filename_struct (struct gcov_filename_aux *gf) +{ + const char *gcov_prefix; + int gcov_prefix_strip = 0; + size_t prefix_length = 0; + char *gi_filename_up; + + /* Allocate and initialize the filename scratch space plus one. */ + gi_filename = (char *) xmalloc (prefix_length + gcov_max_filename + 2); + if (prefix_length) + memcpy (gi_filename, gcov_prefix, prefix_length); + gi_filename_up = gi_filename + prefix_length; + + gf->gi_filename_up = gi_filename_up; + gf->prefix_length = prefix_length; + gf->gcov_prefix_strip = gcov_prefix_strip; +} + +static int +gcov_open_by_filename (char *gi_filename) +{ + gcov_open (gi_filename); + return 0; +} + + +/* Strip GCOV_PREFIX_STRIP levels of leading '/' from FILENAME and + put the result into GI_FILENAME_UP. */ + +static void +gcov_strip_leading_dirs (int prefix_length, int gcov_prefix_strip, + const char *filename, char *gi_filename_up) +{ + strcpy (gi_filename_up, filename); +} + +/* Current virual gcda file. This is for kernel use only. */ +gcov_kernel_vfile *gcov_current_file; + +/* Set current virutal gcda file. It needs to be set before dumping + profile data. */ + +void +gcov_set_vfile (gcov_kernel_vfile *file) +{ + gcov_current_file = file; +} + +/* File fclose operation in kernel mode. */ + +int +kernel_file_fclose (gcov_kernel_vfile *fp) +{ + return 0; +} + +/* File ftell operation in kernel mode. It currently should not + be called. */ + +long +kernel_file_ftell (gcov_kernel_vfile *fp) +{ + return 0; +} + +/* File fseek operation in kernel mode. It should only be called + with OFFSET==0 and WHENCE==0 to a freshly opened file. */ + +int +kernel_file_fseek (gcov_kernel_vfile *fp, long offset, int whence) +{ + gcc_assert (offset == 0 && whence == 0 && fp->count == 0); + return 0; +} + +/* File ftruncate operation in kernel mode. It currently should not + be called. */ + +int +kernel_file_ftruncate (gcov_kernel_vfile *fp, off_t value) +{ + gcc_assert (0); /* should not reach here */ + return 0; +} + +/* File fread operation in kernel mode. It currently should not + be called. */ + +int +kernel_file_fread (void *ptr, size_t size, size_t nitems, + gcov_kernel_vfile *fp) +{ + gcc_assert (0); /* should not reach here */ + return 0; +} + +/* File fwrite operation in kernel mode. It outputs the data + to a buffer in the virual file. */ + +int +kernel_file_fwrite (const void *ptr, size_t size, + size_t nitems, gcov_kernel_vfile *fp) +{ + char *vbuf; + unsigned vsize, vpos; + unsigned len; + + if (!fp) return 0; + + vbuf = fp->buf; + vsize = fp->size; + vpos = fp->count; + + + if (vsize < vpos) + { + printk (KERN_ERR + "GCOV_KERNEL: something wrong in file %s: vbuf=%p vsize=%u" + " vpos=%u\n", + fp->info->filename, vbuf, vsize, vpos); + return 0; + } + + len = vsize - vpos; + len /= size; + + /* Increase the virtual file size if it is not suffcient. */ + while (len < nitems) + { + vsize *= 2; + len = vsize - vpos; + len /= size; + } + + if (vsize != fp->size) + { + vbuf = fp->buf = (char *) gcov_realloc_file_buf(vsize, vpos); + fp->size = vsize; + } + + if (len > nitems) + len = nitems; + + memcpy (vbuf+vpos, ptr, size*len); + fp->count += len*size; + + if (len != nitems) + printk (KERN_ERR + "GCOV_KERNEL: something wrong in file %s: size=%lu nitems=%lu" + " len=%d vsize=%u vpos=%u \n", + fp->info->filename, size, nitems, len, vsize, vpos); + return len; +} + +/* File fileno operation in kernel mode. It currently should not + be called. */ + +int +kernel_file_fileno (gcov_kernel_vfile *fp) +{ + gcc_assert (0); /* should not reach here */ + return 0; +} diff --git a/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/libgcov-driver.c b/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/libgcov-driver.c new file mode 100644 index 0000000..3c569f1 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/libgcov-driver.c @@ -0,0 +1,1550 @@ +/* Routines required for instrumenting a program. */ +/* Compile this one with gcc. */ +/* Copyright (C) 1989-2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +. */ + +#include "libgcov.h" + +#if defined(inhibit_libc) +/* If libc and its header files are not available, provide dummy functions. */ + +#if defined(L_gcov) +void __gcov_init (struct gcov_info *p __attribute__ ((unused))) {} +#endif + +#else /* inhibit_libc */ + +#if !defined(__KERNEL__) +#include +#if GCOV_LOCKED +#include +#include +#include +#endif +#endif /* __KERNEL__ */ + +#ifdef L_gcov +#include "gcov-io.c" + +/* Unique identifier assigned to each module (object file). */ +static gcov_unsigned_t gcov_cur_module_id = 0; + + +/* Dynamic call graph build and form module groups. */ +int __gcov_compute_module_groups (char **zero_counts) ATTRIBUTE_HIDDEN; +void __gcov_finalize_dyn_callgraph (void) ATTRIBUTE_HIDDEN; + +/* The following functions can be called from outside of this file. */ +extern void gcov_clear (void) ATTRIBUTE_HIDDEN; +extern void gcov_exit (void) ATTRIBUTE_HIDDEN; +extern void set_gcov_dump_complete (void) ATTRIBUTE_HIDDEN; +extern void reset_gcov_dump_complete (void) ATTRIBUTE_HIDDEN; +extern int get_gcov_dump_complete (void) ATTRIBUTE_HIDDEN; +extern void set_gcov_list (struct gcov_info *) ATTRIBUTE_HIDDEN; +__attribute__((weak)) void __coverage_callback (gcov_type, int); + +#if !defined(IN_GCOV_TOOL) && !defined(__KERNEL__) +extern gcov_unsigned_t __gcov_sampling_period; +extern gcov_unsigned_t __gcov_has_sampling; +static int gcov_sampling_period_initialized = 0; + +/* Create a strong reference to these symbols so that they are + unconditionally pulled into the instrumented binary, even when + the only reference is a weak reference. This is necessary because + we are using weak references to enable references from code that + may not be linked with libgcov. These are the only symbols that + should be accessed via link references from application code! + + A subtlety of the linker is that it will only resolve weak references + defined within archive libraries when there is a strong reference to + something else defined within the same object file. Since these functions + are defined within their own object files, they would not automatically + get resolved. Since there are symbols within the main L_gcov + section that are strongly referenced during -fprofile-generate and + -ftest-coverage builds, these dummy symbols will always need to be + resolved. */ +void (*__gcov_dummy_ref1)(void) = &__gcov_reset; +void (*__gcov_dummy_ref2)(void) = &__gcov_dump; +extern char *__gcov_get_profile_prefix (void); +char *(*__gcov_dummy_ref3)(void) = &__gcov_get_profile_prefix; +extern void __gcov_set_sampling_period (unsigned int period); +char *(*__gcov_dummy_ref4)(void) = &__gcov_set_sampling_period; +extern unsigned int __gcov_sampling_enabled (void); +char *(*__gcov_dummy_ref5)(void) = &__gcov_sampling_enabled; +extern void __gcov_flush (void); +char *(*__gcov_dummy_ref6)(void) = &__gcov_flush; +extern unsigned int __gcov_profiling_for_test_coverage (void); +char *(*__gcov_dummy_ref7)(void) = &__gcov_profiling_for_test_coverage; +#endif + +/* Default callback function for profile instrumentation callback. */ +__attribute__((weak)) void +__coverage_callback (gcov_type funcdef_no __attribute__ ((unused)), + int edge_no __attribute__ ((unused))) +{ + /* nothing */ +} + +struct gcov_fn_buffer +{ + struct gcov_fn_buffer *next; + unsigned fn_ix; + struct gcov_fn_info info; + /* note gcov_fn_info ends in a trailing array. */ +}; + +struct gcov_summary_buffer +{ + struct gcov_summary_buffer *next; + struct gcov_summary summary; +}; + +/* Chain of per-object gcov structures. */ +extern struct gcov_info *__gcov_list; + +/* Set the head of gcov_list. */ +void +set_gcov_list (struct gcov_info *head) +{ + __gcov_list = head; +} + +/* Flag if the current function being read was marked as having fixed-up + zero counters. */ +static int __gcov_curr_fn_fixed_up; + +/* Set function fixed up flag. */ +void +set_gcov_fn_fixed_up (int fixed_up) +{ + __gcov_curr_fn_fixed_up = fixed_up; +} + +/* Return function fixed up flag. */ +int +get_gcov_fn_fixed_up (void) +{ + return __gcov_curr_fn_fixed_up; +} + +/* Size of the longest file name. */ +/* We need to expose this static variable when compiling for gcov-tool. */ +#ifndef IN_GCOV_TOOL +static +#endif +size_t gcov_max_filename = 0; + +/* Flag when the profile has already been dumped via __gcov_dump(). */ +static int gcov_dump_complete; + +/* A global function that get the vaule of gcov_dump_complete. */ + +int +get_gcov_dump_complete (void) +{ + return gcov_dump_complete; +} + +/* A global functino that set the vaule of gcov_dump_complete. Will + be used in __gcov_dump() in libgcov-interface.c. */ + +void +set_gcov_dump_complete (void) +{ + gcov_dump_complete = 1; +} + +/* A global functino that set the vaule of gcov_dump_complete. Will + be used in __gcov_reset() in libgcov-interface.c. */ + +void +reset_gcov_dump_complete (void) +{ + gcov_dump_complete = 0; +} + +/* A utility function for outputing errors. */ +static int gcov_error (const char *, ...); + +static struct gcov_fn_buffer * +free_fn_data (const struct gcov_info *gi_ptr, struct gcov_fn_buffer *buffer, + unsigned limit) +{ + struct gcov_fn_buffer *next; + unsigned ix, n_ctr = 0; + + if (!buffer) + return 0; + next = buffer->next; + + for (ix = 0; ix != limit; ix++) + if (gi_ptr->merge[ix]) + xfree (buffer->info.ctrs[n_ctr++].values); + xfree (buffer); + return next; +} + +static struct gcov_fn_buffer ** +buffer_fn_data (const char *filename, const struct gcov_info *gi_ptr, + struct gcov_fn_buffer **end_ptr, unsigned fn_ix) +{ + unsigned n_ctrs = 0, ix = 0; + struct gcov_fn_buffer *fn_buffer; + unsigned len; + + for (ix = GCOV_COUNTERS; ix--;) + if (gi_ptr->merge[ix]) + n_ctrs++; + + len = sizeof (*fn_buffer) + sizeof (fn_buffer->info.ctrs[0]) * n_ctrs; + fn_buffer = (struct gcov_fn_buffer *) xmalloc (len); + + if (!fn_buffer) + goto fail; + + fn_buffer->next = 0; + fn_buffer->fn_ix = fn_ix; + fn_buffer->info.ident = gcov_read_unsigned (); + fn_buffer->info.lineno_checksum = gcov_read_unsigned (); + fn_buffer->info.cfg_checksum = gcov_read_unsigned (); + + for (n_ctrs = ix = 0; ix != GCOV_COUNTERS; ix++) + { + gcov_unsigned_t length; + gcov_type *values; + + if (!gi_ptr->merge[ix]) + continue; + + if (gcov_read_unsigned () != GCOV_TAG_FOR_COUNTER (ix)) + { + len = 0; + goto fail; + } + + length = GCOV_TAG_COUNTER_NUM (gcov_read_unsigned ()); + len = length * sizeof (gcov_type); + values = (gcov_type *) xmalloc (len); + if (!values) + goto fail; + + fn_buffer->info.ctrs[n_ctrs].num = length; + fn_buffer->info.ctrs[n_ctrs].values = values; + + while (length--) + *values++ = gcov_read_counter (); + n_ctrs++; + } + + *end_ptr = fn_buffer; + return &fn_buffer->next; + +fail: + gcov_error ("profiling:%s:Function %u %s %u \n", filename, fn_ix, + len ? "cannot allocate" : "counter mismatch", len ? len : ix); + + return (struct gcov_fn_buffer **)free_fn_data (gi_ptr, fn_buffer, ix); +} + +/* Determine whether a counter is active. */ + +static inline int +gcov_counter_active (const struct gcov_info *info, unsigned int type) +{ + return (info->merge[type] != 0); +} + +/* Add an unsigned value to the current crc */ + +static gcov_unsigned_t +crc32_unsigned (gcov_unsigned_t crc32, gcov_unsigned_t value) +{ + unsigned ix; + + for (ix = 32; ix--; value <<= 1) + { + unsigned feedback; + + feedback = (value ^ crc32) & 0x80000000 ? 0x04c11db7 : 0; + crc32 <<= 1; + crc32 ^= feedback; + } + + return crc32; +} + +/* Check if VERSION of the info block PTR matches libgcov one. + Return 1 on success, or zero in case of versions mismatch. + If FILENAME is not NULL, its value used for reporting purposes + instead of value from the info block. */ + +static int +gcov_version (struct gcov_info *ptr, gcov_unsigned_t version, + const char *filename) +{ + if (version != GCOV_VERSION) + { + char v[4], e[4]; + + GCOV_UNSIGNED2STRING (v, version); + GCOV_UNSIGNED2STRING (e, GCOV_VERSION); + + if (filename) + gcov_error ("profiling:%s:Version mismatch - expected %.4s got %.4s\n", + filename? filename : ptr->filename, e, v); + else + gcov_error ("profiling:Version mismatch - expected %.4s got %.4s\n", e, v); + + return 0; + } + return 1; +} + +/* Insert counter VALUE into HISTOGRAM. */ + +static void +gcov_histogram_insert(gcov_bucket_type *histogram, gcov_type value) +{ + unsigned i; + + i = gcov_histo_index(value); + histogram[i].num_counters++; + histogram[i].cum_value += value; + if (value < histogram[i].min_value) + histogram[i].min_value = value; +} + +/* Computes a histogram of the arc counters to place in the summary SUM. */ + +static void +gcov_compute_histogram (struct gcov_summary *sum) +{ + struct gcov_info *gi_ptr; + const struct gcov_fn_info *gfi_ptr; + const struct gcov_ctr_info *ci_ptr; + struct gcov_ctr_summary *cs_ptr; + unsigned t_ix, f_ix, ctr_info_ix, ix; + int h_ix; + + /* This currently only applies to arc counters. */ + t_ix = GCOV_COUNTER_ARCS; + + /* First check if there are any counts recorded for this counter. */ + cs_ptr = &(sum->ctrs[t_ix]); + if (!cs_ptr->num) + return; + + for (h_ix = 0; h_ix < GCOV_HISTOGRAM_SIZE; h_ix++) + { + cs_ptr->histogram[h_ix].num_counters = 0; + cs_ptr->histogram[h_ix].min_value = cs_ptr->run_max; + cs_ptr->histogram[h_ix].cum_value = 0; + } + + /* Walk through all the per-object structures and record each of + the count values in histogram. */ + for (gi_ptr = __gcov_list; gi_ptr; gi_ptr = gi_ptr->next) + { + if (!gi_ptr->merge[t_ix]) + continue; + + /* Find the appropriate index into the gcov_ctr_info array + for the counter we are currently working on based on the + existence of the merge function pointer for this object. */ + for (ix = 0, ctr_info_ix = 0; ix < t_ix; ix++) + { + if (gi_ptr->merge[ix]) + ctr_info_ix++; + } + for (f_ix = 0; f_ix != gi_ptr->n_functions; f_ix++) + { + gfi_ptr = gi_ptr->functions[f_ix]; + + if (!gfi_ptr || gfi_ptr->key != gi_ptr) + continue; + + ci_ptr = &gfi_ptr->ctrs[ctr_info_ix]; + for (ix = 0; ix < ci_ptr->num; ix++) + gcov_histogram_insert (cs_ptr->histogram, ci_ptr->values[ix]); + } + } +} + +/* gcda filename. */ +static char *gi_filename; +/* buffer for the fn_data from another program. */ +static struct gcov_fn_buffer *fn_buffer; +/* buffer for summary from other programs to be written out. */ +static struct gcov_summary_buffer *sum_buffer; +/* If application calls fork or exec multiple times, we end up storing + profile repeadely. We should not account this as multiple runs or + functions executed once may mistakely become cold. */ +static int run_accounted = 0; + +/* This funtions computes the program level summary and the histo-gram. + It computes and returns CRC32 and stored summary in THIS_PRG. */ + +#if !IN_GCOV_TOOL +static +#endif +gcov_unsigned_t +gcov_exit_compute_summary (struct gcov_summary *this_prg) +{ + struct gcov_info *gi_ptr; + const struct gcov_fn_info *gfi_ptr; + struct gcov_ctr_summary *cs_ptr; + const struct gcov_ctr_info *ci_ptr; + int f_ix; + unsigned t_ix; + gcov_unsigned_t c_num; + gcov_unsigned_t crc32 = 0; + + /* Find the totals for this execution. */ + memset (this_prg, 0, sizeof (*this_prg)); + for (gi_ptr = __gcov_list; gi_ptr; gi_ptr = gi_ptr->next) + { + crc32 = crc32_unsigned (crc32, gi_ptr->stamp); + crc32 = crc32_unsigned (crc32, gi_ptr->n_functions); + + for (f_ix = 0; (unsigned)f_ix != gi_ptr->n_functions; f_ix++) + { + gfi_ptr = gi_ptr->functions[f_ix]; + + if (gfi_ptr && gfi_ptr->key != gi_ptr) + gfi_ptr = 0; + + crc32 = crc32_unsigned (crc32, gfi_ptr ? gfi_ptr->cfg_checksum : 0); + crc32 = crc32_unsigned (crc32, + gfi_ptr ? gfi_ptr->lineno_checksum : 0); + if (!gfi_ptr) + continue; + + ci_ptr = gfi_ptr->ctrs; + for (t_ix = 0; t_ix != GCOV_COUNTERS_SUMMABLE; t_ix++) + { + if (!gi_ptr->merge[t_ix]) + continue; + + cs_ptr = &(this_prg->ctrs[t_ix]); + cs_ptr->num += ci_ptr->num; + crc32 = crc32_unsigned (crc32, ci_ptr->num); + + for (c_num = 0; c_num < ci_ptr->num; c_num++) + { + cs_ptr->sum_all += ci_ptr->values[c_num]; + if (cs_ptr->run_max < ci_ptr->values[c_num]) + cs_ptr->run_max = ci_ptr->values[c_num]; + } + ci_ptr++; + } + } + } + gcov_compute_histogram (this_prg); + return crc32; +} + +/* A struct that bundles all the related information about the + gcda filename. */ +struct gcov_filename_aux{ + char *gi_filename_up; + int gcov_prefix_strip; + size_t prefix_length; +}; + +/* Including system dependent components. */ +#if !defined (__KERNEL__) +#include "libgcov-driver-system.c" +#else +#include "libgcov-driver-kernel.c" +#endif + +static int +scan_build_info (struct gcov_info *gi_ptr) +{ + gcov_unsigned_t i, length; + gcov_unsigned_t num_strings = 0; + char **build_info_strings; + + length = gcov_read_unsigned (); + build_info_strings = gcov_read_build_info (length, &num_strings); + if (!build_info_strings) + { + gcov_error ("profiling:%s:Error reading build info\n", gi_filename); + return -1; + } + if (!gi_ptr->build_info) + { + gcov_error ("profiling:%s:Mismatched build info sections, expected " + "none, found %u strings)\n", gi_filename, num_strings); + return -1; + } + + for (i = 0; i < num_strings; i++) + { + if (strcmp (build_info_strings[i], gi_ptr->build_info[i])) + { + gcov_error ("profiling:%s:Mismatched build info string " + "(expected %s, read %s)\n", + gi_filename, gi_ptr->build_info[i], + build_info_strings[i]); + return -1; + } + xfree (build_info_strings[i]); + } + xfree (build_info_strings); + return 0; +} + +#if !defined(__KERNEL__) +/* Scan through the current open gcda file corresponding to GI_PTR + to locate the end position just before function data should be rewritten, + returned in SUMMARY_END_POS_P. E.g. scan past the last summary and other + sections that won't be rewritten, like the build info. Return 0 on success, + -1 on error. */ +static int +gcov_scan_to_function_data (struct gcov_info *gi_ptr, + gcov_position_t *summary_end_pos_p) +{ + gcov_unsigned_t tag, version, stamp; + tag = gcov_read_unsigned (); + if (tag != GCOV_DATA_MAGIC) + { + gcov_error ("profiling:%s:Not a gcov data file\n", gi_filename); + return -1; + } + + version = gcov_read_unsigned (); + if (!gcov_version (gi_ptr, version, gi_filename)) + return -1; + + stamp = gcov_read_unsigned (); + if (stamp != gi_ptr->stamp) + /* Read from a different compilation. Overwrite the file. */ + return -1; + + /* Look for program summary. */ + while (1) + { + struct gcov_summary tmp; + + *summary_end_pos_p = gcov_position (); + tag = gcov_read_unsigned (); + if (tag != GCOV_TAG_PROGRAM_SUMMARY) + break; + + gcov_read_unsigned (); + gcov_read_summary (&tmp); + if (gcov_is_error ()) + return -1; + } + + /* If there is a build info section, scan past it as well. */ + if (tag == GCOV_TAG_BUILD_INFO) + { + if (scan_build_info (gi_ptr) < 0) + return -1; + + *summary_end_pos_p = gcov_position (); + tag = gcov_read_unsigned (); + } + /* The next section should be the function counters. */ + gcc_assert (tag == GCOV_TAG_FUNCTION); + + return 0; +} +#endif /* __KERNEL__ */ + +/* This function merges counters in GI_PTR to an existing gcda file. + Return 0 on success. + Return -1 on error. In this case, caller will goto read_fatal. */ + +static int +gcov_exit_merge_gcda (struct gcov_info *gi_ptr, + struct gcov_summary *prg_p, + struct gcov_summary *this_prg, + gcov_position_t *summary_pos_p, + gcov_position_t *eof_pos_p, + gcov_unsigned_t crc32) +{ + gcov_unsigned_t tag, length; + unsigned t_ix; + int f_ix; + int error = 0; + struct gcov_fn_buffer **fn_tail = &fn_buffer; + struct gcov_summary_buffer **sum_tail = &sum_buffer; + int *zero_fixup_flags = NULL; + + length = gcov_read_unsigned (); + if (!gcov_version (gi_ptr, length, gi_filename)) + return -1; + + length = gcov_read_unsigned (); + if (length != gi_ptr->stamp) + /* Read from a different compilation. Overwrite the file. */ + return 0; + + /* Look for program summary. */ + for (f_ix = 0;;) + { + struct gcov_summary tmp; + + *eof_pos_p = gcov_position (); + tag = gcov_read_unsigned (); + if (tag != GCOV_TAG_PROGRAM_SUMMARY) + break; + + f_ix--; + length = gcov_read_unsigned (); + gcov_read_summary (&tmp); + if ((error = gcov_is_error ())) + goto read_error; + if (*summary_pos_p) + { + /* Save all summaries after the one that will be + merged into below. These will need to be rewritten + as histogram merging may change the number of non-zero + histogram entries that will be emitted, and thus the + size of the merged summary. */ + (*sum_tail) = (struct gcov_summary_buffer *) + xmalloc (sizeof(struct gcov_summary_buffer)); + (*sum_tail)->summary = tmp; + (*sum_tail)->next = 0; + sum_tail = &((*sum_tail)->next); + goto next_summary; + } + if (tmp.checksum != crc32) + goto next_summary; + + for (t_ix = 0; t_ix != GCOV_COUNTERS_SUMMABLE; t_ix++) + if (tmp.ctrs[t_ix].num != this_prg->ctrs[t_ix].num) + goto next_summary; + *prg_p = tmp; + *summary_pos_p = *eof_pos_p; + + next_summary:; + } + + if (tag == GCOV_TAG_BUILD_INFO) + { + if (scan_build_info (gi_ptr) < 0) + return -1; + + /* Since the stamps matched if we got here, this should be from + the same compilation and the build info strings should match. */ + tag = gcov_read_unsigned (); + } + + if (tag == GCOV_TAG_COMDAT_ZERO_FIXUP) + { + gcov_unsigned_t num_fns = 0; + length = gcov_read_unsigned (); + zero_fixup_flags = gcov_read_comdat_zero_fixup (length, &num_fns); + if (!zero_fixup_flags) + { + gcov_error ("profiling:%s:Error reading zero fixup flags\n", + gi_filename); + return -1; + } + + tag = gcov_read_unsigned (); + } + + /* Merge execution counts for each function. */ + for (f_ix = 0; (unsigned)f_ix != gi_ptr->n_functions; + f_ix++, tag = gcov_read_unsigned ()) + { + const struct gcov_ctr_info *ci_ptr; + const struct gcov_fn_info *gfi_ptr = gi_ptr->functions[f_ix]; + + if (tag != GCOV_TAG_FUNCTION) + goto read_mismatch; + + length = gcov_read_unsigned (); + if (!length) + /* This function did not appear in the other program. + We have nothing to merge. */ + continue; + + if (length != GCOV_TAG_FUNCTION_LENGTH) + goto read_mismatch; + + if (!gfi_ptr || gfi_ptr->key != gi_ptr) + { + /* This function appears in the other program. We + need to buffer the information in order to write + it back out -- we'll be inserting data before + this point, so cannot simply keep the data in the + file. */ + fn_tail = buffer_fn_data (gi_filename, + gi_ptr, fn_tail, f_ix); + if (!fn_tail) + goto read_mismatch; + continue; + } + + if (zero_fixup_flags) + set_gcov_fn_fixed_up (zero_fixup_flags[f_ix]); + + length = gcov_read_unsigned (); + if (length != gfi_ptr->ident) + goto read_mismatch; + + length = gcov_read_unsigned (); + if (length != gfi_ptr->lineno_checksum) + goto read_mismatch; + + length = gcov_read_unsigned (); + if (length != gfi_ptr->cfg_checksum) + goto read_mismatch; + + ci_ptr = gfi_ptr->ctrs; + for (t_ix = 0; t_ix < GCOV_COUNTERS; t_ix++) + { + gcov_merge_fn merge = gi_ptr->merge[t_ix]; + + if (!merge) + continue; + + tag = gcov_read_unsigned (); + length = gcov_read_unsigned (); + if (tag != GCOV_TAG_FOR_COUNTER (t_ix) + || length != GCOV_TAG_COUNTER_LENGTH (ci_ptr->num)) + goto read_mismatch; + (*merge) (ci_ptr->values, ci_ptr->num); + ci_ptr++; + } + if ((error = gcov_is_error ())) + goto read_error; + } + xfree (zero_fixup_flags); + + if (tag && tag != GCOV_TAG_MODULE_INFO) + { + read_mismatch:; + gcov_error ("profiling:%s:Merge mismatch for %s %u\n", + gi_filename, f_ix >= 0 ? "function" : "summary", + f_ix < 0 ? -1 - f_ix : f_ix); + return -1; + } + return 0; + +read_error: + gcov_error ("profiling:%s:%s merging\n", gi_filename, + error < 0 ? "Overflow": "Error"); + return -1; +} + +#if !defined(__KERNEL__) +/* Write NUM_FNS ZERO_COUNTS fixup flags to a gcda file starting from its + current location. */ + +static void +gcov_write_comdat_zero_fixup (char *zero_counts, unsigned num_fns) +{ + unsigned f_ix; + gcov_unsigned_t len = GCOV_TAG_COMDAT_ZERO_FIXUP_LENGTH (num_fns); + gcov_unsigned_t bitvector = 0, b_ix = 0; + gcov_write_tag_length (GCOV_TAG_COMDAT_ZERO_FIXUP, len); + + gcov_write_unsigned (num_fns); + for (f_ix = 0; f_ix != num_fns; f_ix++) + { + if (zero_counts[f_ix]) + bitvector |= 1 << b_ix; + if (++b_ix == 32) + { + gcov_write_unsigned (bitvector); + b_ix = 0; + bitvector = 0; + } + } + if (b_ix > 0) + gcov_write_unsigned (bitvector); +} +#endif /* __KERNEL__ */ + +/* Write build_info strings from GI_PTR to a gcda file starting from its current + location. */ + +static void +gcov_write_build_info (struct gcov_info *gi_ptr) +{ + gcov_unsigned_t num = 0; + gcov_unsigned_t len = 1; + + if (!gi_ptr->build_info) + return; + + /* Count the number of strings, which is terminated with an empty string. */ + while (gi_ptr->build_info[num][0]) + num++; + + len += gcov_compute_string_array_len (gi_ptr->build_info, num); + gcov_write_tag_length (GCOV_TAG_BUILD_INFO, len); + gcov_write_unsigned (num); + gcov_write_string_array (gi_ptr->build_info, num); +} + +/* Write counters in GI_PTR to a gcda file starting from its current + location. */ + +static void +gcov_write_func_counters (struct gcov_info *gi_ptr) +{ + unsigned f_ix; + + /* Write execution counts for each function. */ + for (f_ix = 0; f_ix != gi_ptr->n_functions; f_ix++) + { + unsigned buffered = 0; + const struct gcov_fn_info *gfi_ptr; + const struct gcov_ctr_info *ci_ptr; + gcov_unsigned_t length; + unsigned t_ix; + + if (fn_buffer && fn_buffer->fn_ix == f_ix) + { + /* Buffered data from another program. */ + buffered = 1; + gfi_ptr = &fn_buffer->info; + length = GCOV_TAG_FUNCTION_LENGTH; + } + else + { + gfi_ptr = gi_ptr->functions[f_ix]; + if (gfi_ptr && gfi_ptr->key == gi_ptr) + length = GCOV_TAG_FUNCTION_LENGTH; + else + length = 0; + } + + gcov_write_tag_length (GCOV_TAG_FUNCTION, length); + if (!length) + continue; + + gcov_write_unsigned (gfi_ptr->ident); + gcov_write_unsigned (gfi_ptr->lineno_checksum); + gcov_write_unsigned (gfi_ptr->cfg_checksum); + + ci_ptr = gfi_ptr->ctrs; + for (t_ix = 0; t_ix < GCOV_COUNTERS; t_ix++) + { + gcov_unsigned_t n_counts; + gcov_type *c_ptr; + + if (!gi_ptr->merge[t_ix]) + continue; + + n_counts = ci_ptr->num; + gcov_write_tag_length (GCOV_TAG_FOR_COUNTER (t_ix), + GCOV_TAG_COUNTER_LENGTH (n_counts)); + c_ptr = ci_ptr->values; + while (n_counts--) + gcov_write_counter (*c_ptr++); + ci_ptr++; + } +#if !defined(__KERNEL__) + if (buffered) + fn_buffer = free_fn_data (gi_ptr, fn_buffer, GCOV_COUNTERS); +#endif /* __KERNEL__ */ + } + + gi_ptr->eof_pos = gcov_position (); + gcov_write_unsigned (0); +} + +/* Write counters in GI_PTR and the summary in PRG to a gcda file. In + the case of appending to an existing file, SUMMARY_POS will be non-zero. + We will write the file starting from SUMMAY_POS. */ + +static void +gcov_exit_write_gcda (struct gcov_info *gi_ptr, + const struct gcov_summary *prg_p, + const gcov_position_t eof_pos, + const gcov_position_t summary_pos) + +{ + struct gcov_summary_buffer *next_sum_buffer; + + /* Write out the data. */ + if (!eof_pos) + { + gcov_write_tag_length (GCOV_DATA_MAGIC, GCOV_VERSION); + gcov_write_unsigned (gi_ptr->stamp); + } + + if (summary_pos) + gcov_seek (summary_pos); + gcc_assert (!summary_pos || summary_pos == gcov_position ()); + + /* Generate whole program statistics. */ + gcov_write_summary (GCOV_TAG_PROGRAM_SUMMARY, prg_p); + + /* Rewrite all the summaries that were after the summary we merged + into. This is necessary as the merged summary may have a different + size due to the number of non-zero histogram entries changing after + merging. */ + + while (sum_buffer) + { + gcov_write_summary (GCOV_TAG_PROGRAM_SUMMARY, &sum_buffer->summary); + next_sum_buffer = sum_buffer->next; + xfree (sum_buffer); + sum_buffer = next_sum_buffer; + } + + gcov_write_build_info (gi_ptr); + + /* Write the counters. */ + gcov_write_func_counters (gi_ptr); +} + +/* Helper function for merging summary. + Return -1 on error. Return 0 on success. */ + +static int +gcov_exit_merge_summary (const struct gcov_info *gi_ptr, struct gcov_summary *prg, + struct gcov_summary *this_prg, gcov_unsigned_t crc32, + struct gcov_summary *all_prg __attribute__ ((unused))) +{ + struct gcov_ctr_summary *cs_prg, *cs_tprg; + unsigned t_ix; +#if !GCOV_LOCKED + /* summary for all instances of program. */ + struct gcov_ctr_summary *cs_all; +#endif + + /* Merge the summaries. */ + for (t_ix = 0; t_ix < GCOV_COUNTERS_SUMMABLE; t_ix++) + { + cs_prg = &(prg->ctrs[t_ix]); + cs_tprg = &(this_prg->ctrs[t_ix]); + + if (gi_ptr->merge[t_ix]) + { + int first = !cs_prg->runs; + + if (!run_accounted) + cs_prg->runs++; + if (first) + cs_prg->num = cs_tprg->num; + cs_prg->sum_all += cs_tprg->sum_all; + if (cs_prg->run_max < cs_tprg->run_max) + cs_prg->run_max = cs_tprg->run_max; + cs_prg->sum_max += cs_tprg->run_max; + if (first) + memcpy (cs_prg->histogram, cs_tprg->histogram, + sizeof (gcov_bucket_type) * GCOV_HISTOGRAM_SIZE); + else + gcov_histogram_merge (cs_prg->histogram, cs_tprg->histogram); + } + else if (cs_prg->runs) + { + gcov_error ("profiling:%s:Merge mismatch for summary.\n", + gi_filename); + return -1; + } +#if !GCOV_LOCKED + cs_all = &all_prg->ctrs[t_ix]; + if (!cs_all->runs && cs_prg->runs) + { + cs_all->num = cs_prg->num; + cs_all->runs = cs_prg->runs; + cs_all->sum_all = cs_prg->sum_all; + cs_all->run_max = cs_prg->run_max; + cs_all->sum_max = cs_prg->sum_max; + } + else if (!all_prg->checksum + /* Don't compare the histograms, which may have slight + variations depending on the order they were updated + due to the truncating integer divides used in the + merge. */ + && (cs_all->num != cs_prg->num + || cs_all->runs != cs_prg->runs + || cs_all->sum_all != cs_prg->sum_all + || cs_all->run_max != cs_prg->run_max + || cs_all->sum_max != cs_prg->sum_max)) + { + gcov_error ("profiling:%s:Data file mismatch - some " + "data files may have been concurrently " + "updated without locking support\n", gi_filename); + all_prg->checksum = ~0u; + } +#endif + } + + prg->checksum = crc32; + + return 0; +} + +__attribute__((weak)) gcov_unsigned_t __gcov_lipo_sampling_period; + +/* Sort N entries in VALUE_ARRAY in descending order. + Each entry in VALUE_ARRAY has two values. The sorting + is based on the second value. */ + +GCOV_LINKAGE void +gcov_sort_n_vals (gcov_type *value_array, int n) +{ + int j, k; + for (j = 2; j < n; j += 2) + { + gcov_type cur_ent[2]; + cur_ent[0] = value_array[j]; + cur_ent[1] = value_array[j + 1]; + k = j - 2; + while (k >= 0 && value_array[k + 1] < cur_ent[1]) + { + value_array[k + 2] = value_array[k]; + value_array[k + 3] = value_array[k+1]; + k -= 2; + } + value_array[k + 2] = cur_ent[0]; + value_array[k + 3] = cur_ent[1]; + } +} + +/* Sort the profile counters for all indirect call sites. Counters + for each call site are allocated in array COUNTERS. */ + +static void +gcov_sort_icall_topn_counter (const struct gcov_ctr_info *counters) +{ + int i; + gcov_type *values; + int n = counters->num; + gcc_assert (!(n % GCOV_ICALL_TOPN_NCOUNTS)); + + values = counters->values; + + for (i = 0; i < n; i += GCOV_ICALL_TOPN_NCOUNTS) + { + gcov_type *value_array = &values[i + 1]; + gcov_sort_n_vals (value_array, GCOV_ICALL_TOPN_NCOUNTS - 1); + } +} + +static void +gcov_sort_topn_counter_arrays (const struct gcov_info *gi_ptr) +{ + unsigned int i; + int f_ix; + const struct gcov_fn_info *gfi_ptr; + const struct gcov_ctr_info *ci_ptr; + + for (f_ix = 0; (unsigned)f_ix != gi_ptr->n_functions; f_ix++) + { + gfi_ptr = gi_ptr->functions[f_ix]; + ci_ptr = gfi_ptr->ctrs; + for (i = 0; i < GCOV_COUNTERS; i++) + { + if (!gcov_counter_active (gi_ptr, i)) + continue; + if (i == GCOV_COUNTER_ICALL_TOPNV) + { + gcov_sort_icall_topn_counter (ci_ptr); + break; + } + ci_ptr++; + } + } +} + +/* Scaling LIPO sampled profile counters. */ +static void +gcov_scaling_lipo_counters (const struct gcov_info *gi_ptr) +{ + unsigned int i,j,k; + int f_ix; + const struct gcov_fn_info *gfi_ptr; + const struct gcov_ctr_info *ci_ptr; + + if (__gcov_lipo_sampling_period <= 1) + return; + + for (f_ix = 0; (unsigned)f_ix != gi_ptr->n_functions; f_ix++) + { + gfi_ptr = gi_ptr->functions[f_ix]; + ci_ptr = gfi_ptr->ctrs; + for (i = 0; i < GCOV_COUNTERS; i++) + { + if (!gcov_counter_active (gi_ptr, i)) + continue; + if (i == GCOV_COUNTER_ICALL_TOPNV) + { + for (j = 0; j < ci_ptr->num; j += GCOV_ICALL_TOPN_NCOUNTS) + for (k = 2; k < GCOV_ICALL_TOPN_NCOUNTS; k += 2) + ci_ptr->values[j+k] *= __gcov_lipo_sampling_period; + } + if (i == GCOV_COUNTER_DIRECT_CALL) + { + for (j = 0; j < ci_ptr->num; j += 2) + ci_ptr->values[j+1] *= __gcov_lipo_sampling_period; + } + ci_ptr++; + } + } +} + +/* Open a gcda file specified by GI_FILENAME. + Return -1 on error. Return 0 on success. */ + +static int +gcov_exit_open_gcda_file (struct gcov_info *gi_ptr, struct gcov_filename_aux *gf) +{ + int gcov_prefix_strip; + size_t prefix_length; + char *gi_filename_up; + + gcov_prefix_strip = gf->gcov_prefix_strip; + gi_filename_up = gf->gi_filename_up; + prefix_length = gf->prefix_length; + + gcov_strip_leading_dirs (prefix_length, gcov_prefix_strip, gi_ptr->filename, + gi_filename_up); + + return gcov_open_by_filename (gi_filename); +} + +/* Dump the coverage counts for one gcov_info object. We merge with existing + counts when possible, to avoid growing the .da files ad infinitum. We use + this program's checksum to make sure we only accumulate whole program + statistics to the correct summary. An object file might be embedded + in two separate programs, and we must keep the two program + summaries separate. */ + +static void +gcov_exit_dump_gcov (struct gcov_info *gi_ptr, struct gcov_filename_aux *gf, + gcov_unsigned_t crc32, struct gcov_summary *all_prg, + struct gcov_summary *this_prg) +{ +/* We have to make the decl static as kernel has limited stack size. + If we put prg to stack, we will running into nasty stack overflow. */ +#if defined(__KERNEL__) + static +#endif + struct gcov_summary prg; /* summary for this object over all program. */ + int error; + gcov_unsigned_t tag = 0; + gcov_position_t summary_pos = 0; + gcov_position_t eof_pos = 0; + + fn_buffer = 0; + sum_buffer = 0; + + gcov_sort_topn_counter_arrays (gi_ptr); + gcov_scaling_lipo_counters (gi_ptr); + + error = gcov_exit_open_gcda_file (gi_ptr, gf); + if (error == -1) + return; + +#if !defined(__KERNEL__) + tag = gcov_read_unsigned (); +#endif + if (tag) + { + /* Merge data from file. */ + if (tag != GCOV_DATA_MAGIC) + { + gcov_error ("profiling:%s:Not a gcov data file\n", gi_filename); + goto read_fatal; + } + error = gcov_exit_merge_gcda (gi_ptr, &prg, this_prg, &summary_pos, &eof_pos, + crc32); + if (error == -1) + goto read_fatal; + } + + gcov_rewrite (); + + if (!summary_pos) + { + memset (&prg, 0, sizeof (prg)); + summary_pos = eof_pos; + } + + error = gcov_exit_merge_summary (gi_ptr, &prg, this_prg, crc32, all_prg); + if (error == -1) + goto read_fatal; + + gcov_exit_write_gcda (gi_ptr, &prg, eof_pos, summary_pos); + /* fall through */ + +read_fatal:; +#if !defined(__KERNEL__) + while (fn_buffer) + fn_buffer = free_fn_data (gi_ptr, fn_buffer, GCOV_COUNTERS); +#else + + /* In LIPO mode, dump the primary module info. */ + if (gi_ptr->mod_info && gi_ptr->mod_info->is_primary) + { + /* Overwrite the zero word at the of the file. */ + gcov_seek (gi_ptr->eof_pos); + gcov_write_module_info (gi_ptr, 1); + /* Write the end marker */ + gcov_write_unsigned (0); + } +#endif + + if ((error = gcov_close ())) + gcov_error (error < 0 ? + "profiling:%s:Overflow writing\n" : + "profiling:%s:Error writing\n", + gi_filename); +} + +#if !defined (__KERNEL__) +/* Write imported files (auxiliary modules) for primary module GI_PTR + into file GI_FILENAME. */ + +static void +gcov_write_import_file (char *gi_filename, struct gcov_info *gi_ptr) +{ + char *gi_imports_filename; + const char *gcov_suffix; + FILE *imports_file; + size_t prefix_length, suffix_length; + + gcov_suffix = getenv ("GCOV_IMPORTS_SUFFIX"); + if (!gcov_suffix || !strlen (gcov_suffix)) + gcov_suffix = ".imports"; + suffix_length = strlen (gcov_suffix); + prefix_length = strlen (gi_filename); + gi_imports_filename = (char *) alloca (prefix_length + suffix_length + 1); + memset (gi_imports_filename, 0, prefix_length + suffix_length + 1); + memcpy (gi_imports_filename, gi_filename, prefix_length); + memcpy (gi_imports_filename + prefix_length, gcov_suffix, suffix_length); + imports_file = fopen (gi_imports_filename, "w"); + if (imports_file) + { + const struct dyn_imp_mod **imp_mods; + unsigned i, imp_len; + imp_mods = gcov_get_sorted_import_module_array (gi_ptr, &imp_len); + if (imp_mods) + { + for (i = 0; i < imp_len; i++) + { + fprintf (imports_file, "%s\n", + imp_mods[i]->imp_mod->mod_info->source_filename); + fprintf (imports_file, "%s%s\n", + imp_mods[i]->imp_mod->mod_info->da_filename, GCOV_DATA_SUFFIX); + } + xfree (imp_mods); + } + fclose (imports_file); + } +} + +static void +gcov_dump_module_info (struct gcov_filename_aux *gf) +{ + struct gcov_info *gi_ptr; + + unsigned max_module_id = 0; + for (gi_ptr = __gcov_list; gi_ptr; gi_ptr = gi_ptr->next) + { + unsigned mod_id = gi_ptr->mod_info->ident; + if (max_module_id < mod_id) + max_module_id = mod_id; + } + char **zero_counts = (char **) xcalloc (max_module_id, sizeof (char *)); + for (gi_ptr = __gcov_list; gi_ptr; gi_ptr = gi_ptr->next) + { + unsigned mod_id = gi_ptr->mod_info->ident; + zero_counts[mod_id-1] = (char *) xcalloc (gi_ptr->n_functions, + sizeof (char)); + } + + /* Compute the module groups and record whether there were any + counter fixups applied that require rewriting the counters. */ + int changed = __gcov_compute_module_groups (zero_counts); + + /* Now write out module group info. */ + for (gi_ptr = __gcov_list; gi_ptr; gi_ptr = gi_ptr->next) + { + int error; + + if (gcov_exit_open_gcda_file (gi_ptr, gf) == -1) + continue; + + if (changed) + { + /* Scan file to find the start of the function section, which is + where we will start re-writing the counters. */ + gcov_position_t summary_end_pos; + if (gcov_scan_to_function_data (gi_ptr, &summary_end_pos) == -1) + gcov_error ("profiling:%s:Error scanning summaries\n", + gi_filename); + else + { + gcov_position_t eof_pos = gi_ptr->eof_pos; + gcov_rewrite (); + gcov_seek (summary_end_pos); + + unsigned mod_id = gi_ptr->mod_info->ident; + gcov_write_comdat_zero_fixup (zero_counts[mod_id-1], + gi_ptr->n_functions); + gcov_position_t zero_fixup_eof_pos = gcov_position (); + + gcov_write_func_counters (gi_ptr); + gcc_assert (eof_pos + (zero_fixup_eof_pos - summary_end_pos) + == gi_ptr->eof_pos); + } + } + else + gcov_rewrite (); + + /* Overwrite the zero word at the of the file. */ + gcov_seek (gi_ptr->eof_pos); + + gcov_write_module_infos (gi_ptr); + /* Write the end marker */ + gcov_write_unsigned (0); + gcov_truncate (); + + if ((error = gcov_close ())) + gcov_error (error < 0 ? "profiling:%s:Overflow writing\n" : + "profiling:%s:Error writing\n", + gi_filename); + gcov_write_import_file (gi_filename, gi_ptr); + free (zero_counts[gi_ptr->mod_info->ident-1]); + } + + free (zero_counts); + + __gcov_finalize_dyn_callgraph (); +} + +/* Dump all the coverage counts for the program. It first computes program + summary and then traverses gcov_list list and dumps the gcov_info + objects one by one. */ + +void +gcov_exit (void) +{ + struct gcov_info *gi_ptr; + struct gcov_filename_aux gf; + gcov_unsigned_t crc32; + int dump_module_info = 0; + struct gcov_summary all_prg; + struct gcov_summary this_prg; + + /* Prevent the counters from being dumped a second time on exit when the + application already wrote out the profile using __gcov_dump(). */ + if (gcov_dump_complete) + return; + + crc32 = gcov_exit_compute_summary (&this_prg); + + allocate_filename_struct (&gf); +#if !GCOV_LOCKED + memset (&all_prg, 0, sizeof (all_prg)); +#endif + + /* Now merge each file. */ + for (gi_ptr = __gcov_list; gi_ptr; gi_ptr = gi_ptr->next) + { + gcov_exit_dump_gcov (gi_ptr, &gf, crc32, &all_prg, &this_prg); + + /* The IS_PRIMARY field is overloaded to indicate if this module + is FDO/LIPO. */ + if (gi_ptr->mod_info) + dump_module_info |= gi_ptr->mod_info->is_primary; + } + run_accounted = 1; + + if (dump_module_info) + gcov_dump_module_info (&gf); + + if (gi_filename) + xfree (gi_filename); +} + +/* Add a new object file onto the bb chain. Invoked automatically + when running an object file's global ctors. */ + +void +__gcov_init (struct gcov_info *info) +{ +#ifndef IN_GCOV_TOOL + if (!gcov_sampling_period_initialized) + { + const char* env_value_str = getenv ("GCOV_SAMPLING_PERIOD"); + if (env_value_str) + { + int env_value_int = atoi(env_value_str); + if (env_value_int >= 1) + __gcov_sampling_period = env_value_int; + } + env_value_str = getenv ("GCOV_LIPO_SAMPLING_PERIOD"); + if (env_value_str) + { + int env_value_int = atoi(env_value_str); + if (env_value_int >= 0) + __gcov_lipo_sampling_period = env_value_int; + } + gcov_sampling_period_initialized = 1; + } +#endif + + if (!info->version || !info->n_functions) + return; + if (gcov_version (info, info->version, 0)) + { + size_t filename_length = strlen(info->filename); + + /* Refresh the longest file name information */ + if (filename_length > gcov_max_filename) + gcov_max_filename = filename_length; + + /* Assign the module ID (starting at 1). */ + info->mod_info->ident = (++gcov_cur_module_id); + gcc_assert (EXTRACT_MODULE_ID_FROM_GLOBAL_ID (GEN_FUNC_GLOBAL_ID ( + info->mod_info->ident, 0)) + == info->mod_info->ident); + + if (!__gcov_list) + atexit (gcov_exit); + + info->next = __gcov_list; + __gcov_list = info; + } + info->version = 0; +} + +#else /* __KERNEL__ */ + +static struct gcov_filename_aux gf; +static gcov_unsigned_t crc32; +static struct gcov_summary all_prg; +static struct gcov_summary this_prg; +void +gcov_kernel_dump_gcov_init (void) +{ + crc32 = gcov_exit_compute_summary (&this_prg); + allocate_filename_struct (&gf); + memset (&all_prg, 0, sizeof (all_prg)); +} + +void +gcov_kernel_dump_one_gcov(struct gcov_info *info) +{ + gcov_exit_dump_gcov (info, &gf, crc32, &all_prg, &this_prg); +} + +#endif /* __KERNEL__ */ + +/* Reset all counters to zero. */ + +void +gcov_clear (void) +{ + const struct gcov_info *gi_ptr; + + for (gi_ptr = __gcov_list; gi_ptr; gi_ptr = gi_ptr->next) + { + unsigned f_ix; + + for (f_ix = 0; f_ix < gi_ptr->n_functions; f_ix++) + { + unsigned t_ix; + const struct gcov_fn_info *gfi_ptr = gi_ptr->functions[f_ix]; + const struct gcov_ctr_info *ci_ptr; + + if (!gfi_ptr || gfi_ptr->key != gi_ptr) + continue; + ci_ptr = gfi_ptr->ctrs; + for (t_ix = 0; t_ix != GCOV_COUNTERS; t_ix++) + { + if (!gi_ptr->merge[t_ix]) + continue; + + memset (ci_ptr->values, 0, sizeof (gcov_type) * ci_ptr->num); + ci_ptr++; + } + } + } +} + +/* Write out MOD_INFO into the gcda file. IS_PRIMARY is a flag + indicating if the module is the primary module in the group. */ + +void +gcov_write_module_info (const struct gcov_info *mod_info, + unsigned is_primary) +{ + gcov_unsigned_t len = 0, filename_len = 0, src_filename_len = 0, i; + gcov_unsigned_t num_strings; + gcov_unsigned_t *aligned_fname; + struct gcov_module_info *module_info = mod_info->mod_info; + filename_len = (strlen (module_info->da_filename) + + sizeof (gcov_unsigned_t)) / sizeof (gcov_unsigned_t); + src_filename_len = (strlen (module_info->source_filename) + + sizeof (gcov_unsigned_t)) / sizeof (gcov_unsigned_t); + len = filename_len + src_filename_len; + len += 2; /* each name string is led by a length. */ + + num_strings = module_info->num_quote_paths + module_info->num_bracket_paths + + module_info->num_system_paths + + module_info->num_cpp_defines + module_info->num_cpp_includes + + module_info->num_cl_args; + len += gcov_compute_string_array_len (module_info->string_array, + num_strings); + + len += 11; /* 11 more fields */ + + gcov_write_tag_length (GCOV_TAG_MODULE_INFO, len); + gcov_write_unsigned (module_info->ident); + gcov_write_unsigned (is_primary); + gcov_write_unsigned (module_info->flags); + gcov_write_unsigned (module_info->lang); + gcov_write_unsigned (module_info->ggc_memory); + gcov_write_unsigned (module_info->num_quote_paths); + gcov_write_unsigned (module_info->num_bracket_paths); + gcov_write_unsigned (module_info->num_system_paths); + gcov_write_unsigned (module_info->num_cpp_defines); + gcov_write_unsigned (module_info->num_cpp_includes); + gcov_write_unsigned (module_info->num_cl_args); + + /* Now write the filenames */ + aligned_fname = (gcov_unsigned_t *) alloca ((filename_len + src_filename_len + 2) * + sizeof (gcov_unsigned_t)); + memset (aligned_fname, 0, + (filename_len + src_filename_len + 2) * sizeof (gcov_unsigned_t)); + aligned_fname[0] = filename_len; + strcpy ((char*) (aligned_fname + 1), module_info->da_filename); + aligned_fname[filename_len + 1] = src_filename_len; + strcpy ((char*) (aligned_fname + filename_len + 2), module_info->source_filename); + + for (i = 0; i < (filename_len + src_filename_len + 2); i++) + gcov_write_unsigned (aligned_fname[i]); + + /* Now write the string array. */ + gcov_write_string_array (module_info->string_array, num_strings); +} + +#endif /* L_gcov */ +#endif /* inhibit_libc */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/libgcov-kernel.h b/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/libgcov-kernel.h new file mode 100644 index 0000000..b44af53 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/libgcov-kernel.h @@ -0,0 +1,121 @@ +/* Header file for libgcov-*.c. + Copyright (C) 1996-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it under + the terms of the GNU General Public License as published by the Free + Software Foundation; either version 3, or (at your option) any later + version. + + GCC is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or + FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef GCC_LIBGCOV_KERNEL_H +#define GCC_LIBGCOV_KERNEL_H + +/* work around the poisoned malloc/calloc in system.h. */ +#ifndef xmalloc +#define xmalloc vmalloc +#endif +#ifndef xcalloc +#define xcalloc vcalloc +#endif +#ifndef xrealloc +#define xrealloc vrealloc +#endif +#ifndef xfree +#define xfree vfree +#endif +#ifndef alloca +#define alloca __builtin_alloca +#endif + +#ifndef SEEK_SET +#define SEEK_SET 0 +#endif + + /* Define MACROs to be used by kernel compilation. */ +# define L_gcov +# define L_gcov_interval_profiler +# define L_gcov_pow2_profiler +# define L_gcov_one_value_profiler +# define L_gcov_indirect_call_profiler_v2 +# define L_gcov_direct_call_profiler +# define L_gcov_indirect_call_profiler +# define L_gcov_indirect_call_topn_profiler +# define L_gcov_time_profiler +# define L_gcov_average_profiler +# define L_gcov_ior_profiler +# define L_gcov_merge_add +# define L_gcov_merge_single +# define L_gcov_merge_delta +# define L_gcov_merge_ior +# define L_gcov_merge_time_profile +# define L_gcov_merge_icall_topn +# define L_gcov_merge_dc + +# define IN_LIBGCOV 1 +# define IN_GCOV 0 +#define THREAD_PREFIX +#define GCOV_LINKAGE /* nothing */ +#define BITS_PER_UNIT 8 +#define LONG_LONG_TYPE_SIZE 64 +#define MEMMODEL_RELAXED 0 + +#define ENABLE_ASSERT_CHECKING 1 + +/* gcc_assert() prints out a warning if the check fails. It + will not abort. */ +#if ENABLE_ASSERT_CHECKING +# define gcc_assert(EXPR) \ + ((void)(!(EXPR) ? printk (KERN_WARNING \ + "GCOV assertion fails: func=%s line=%d\n", \ + __FUNCTION__, __LINE__), 0 : 0)) +#else +# define gcc_assert(EXPR) ((void)(0 && (EXPR))) +#endif + +/* In Linux kernel mode, a virtual file is used for file operations. */ +struct gcov_info; +typedef struct { + long size; /* size of buf */ + long count; /* element written into buf */ + struct gcov_info *info; + char *buf; +} gcov_kernel_vfile; + +#define _GCOV_FILE gcov_kernel_vfile + +/* Wrappers to the file operations. */ +#define _GCOV_fclose kernel_file_fclose +#define _GCOV_ftell kernel_file_ftell +#define _GCOV_fseek kernel_file_fseek +#define _GCOV_ftruncate kernel_file_ftruncate +#define _GCOV_fread kernel_file_fread +#define _GCOV_fwrite kernel_file_fwrite +#define _GCOV_fileno kernel_file_fileno + +/* Declarations for virtual files operations. */ +extern int kernel_file_fclose (gcov_kernel_vfile *); +extern long kernel_file_ftell (gcov_kernel_vfile *); +extern int kernel_file_fseek (gcov_kernel_vfile *, long, int); +extern int kernel_file_ftruncate (gcov_kernel_vfile *, off_t); +extern int kernel_file_fread (void *, size_t, size_t, + gcov_kernel_vfile *); +extern int kernel_file_fwrite (const void *, size_t, size_t, + gcov_kernel_vfile *); +extern int kernel_file_fileno (gcov_kernel_vfile *); + +#endif /* GCC_LIBGCOV_KERNEL_H */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/libgcov-merge.c b/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/libgcov-merge.c new file mode 100644 index 0000000..997dab3 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/libgcov-merge.c @@ -0,0 +1,299 @@ +/* Routines required for instrumenting a program. */ +/* Compile this one with gcc. */ +/* Copyright (C) 1989-2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +. */ + +#include "libgcov.h" + +#if defined(inhibit_libc) +/* If libc and its header files are not available, provide dummy functions. */ + +#ifdef L_gcov_merge_add +void __gcov_merge_add (gcov_type *counters __attribute__ ((unused)), + unsigned n_counters __attribute__ ((unused))) {} +#endif + +#ifdef L_gcov_merge_single +void __gcov_merge_single (gcov_type *counters __attribute__ ((unused)), + unsigned n_counters __attribute__ ((unused))) {} +#endif + +#ifdef L_gcov_merge_delta +void __gcov_merge_delta (gcov_type *counters __attribute__ ((unused)), + unsigned n_counters __attribute__ ((unused))) {} +#endif + +#else + +#ifdef L_gcov_merge_add +/* The profile merging function that just adds the counters. It is given + an array COUNTERS of N_COUNTERS old counters and it reads the same number + of counters from the gcov file. */ +void +__gcov_merge_add (gcov_type *counters, unsigned n_counters) +{ + for (; n_counters; counters++, n_counters--) + *counters += gcov_get_counter (); +} +#endif /* L_gcov_merge_add */ + +#ifdef L_gcov_merge_ior +/* The profile merging function that just adds the counters. It is given + an array COUNTERS of N_COUNTERS old counters and it reads the same number + of counters from the gcov file. */ +void +__gcov_merge_ior (gcov_type *counters, unsigned n_counters) +{ + for (; n_counters; counters++, n_counters--) + *counters |= gcov_get_counter_target (); +} +#endif + + +#ifdef L_gcov_merge_dc + +/* Returns 1 if the function global id GID is not valid. */ + +static int +__gcov_is_gid_insane (gcov_type gid) +{ + if (EXTRACT_MODULE_ID_FROM_GLOBAL_ID (gid) == 0 + || EXTRACT_FUNC_ID_FROM_GLOBAL_ID (gid) == 0) + return 1; + return 0; +} + +/* The profile merging function used for merging direct call counts + This function is given array COUNTERS of N_COUNTERS old counters and it + reads the same number of counters from the gcov file. */ + +void +__gcov_merge_dc (gcov_type *counters, unsigned n_counters) +{ + unsigned i; + + gcc_assert (!(n_counters % 2)); + for (i = 0; i < n_counters; i += 2) + { + gcov_type global_id = gcov_get_counter_target (); + gcov_type call_count = gcov_get_counter (); + + /* Note that global id counter may never have been set if no calls were + made from this call-site. */ + if (counters[i] && global_id) + { + /* TODO race condition requires us do the following correction. */ + if (__gcov_is_gid_insane (counters[i])) + counters[i] = global_id; + else if (__gcov_is_gid_insane (global_id)) + global_id = counters[i]; + +#if !defined(__KERNEL__) + /* In the case of inconsistency, use the src's target. */ + if (counters[i] != global_id) + fprintf (stderr, "Warning: Inconsistent call targets in" + " direct-call profile.\n"); +#endif + } + else if (global_id) + counters[i] = global_id; + + counters[i + 1] += call_count; + + /* Reset. */ + if (__gcov_is_gid_insane (counters[i])) + counters[i] = counters[i + 1] = 0; + + /* Assert that the invariant (global_id == 0) <==> (call_count == 0) + holds true after merging. */ + if (counters[i] == 0) + counters[i+1] = 0; + if (counters[i + 1] == 0) + counters[i] = 0; + } +} +#endif + + +#ifdef L_gcov_merge_icall_topn +/* The profile merging function used for merging indirect call counts + This function is given array COUNTERS of N_COUNTERS old counters and it + reads the same number of counters from the gcov file. */ + +void +__gcov_merge_icall_topn (gcov_type *counters, unsigned n_counters) +{ + unsigned i, j, k, m; + + gcc_assert (!(n_counters % GCOV_ICALL_TOPN_NCOUNTS)); + for (i = 0; i < n_counters; i += GCOV_ICALL_TOPN_NCOUNTS) + { + gcov_type *value_array = &counters[i + 1]; + unsigned tmp_size = 2 * (GCOV_ICALL_TOPN_NCOUNTS - 1); + gcov_type *tmp_array + = (gcov_type *) alloca (tmp_size * sizeof (gcov_type)); + + for (j = 0; j < tmp_size; j++) + tmp_array[j] = 0; + + for (j = 0; j < GCOV_ICALL_TOPN_NCOUNTS - 1; j += 2) + { + tmp_array[j] = value_array[j]; + tmp_array[j + 1] = value_array [j + 1]; + } + + /* Skip the number_of_eviction entry. */ + gcov_get_counter (); + for (k = 0; k < GCOV_ICALL_TOPN_NCOUNTS - 1; k += 2) + { + int found = 0; + gcov_type global_id = gcov_get_counter_target (); + gcov_type call_count = gcov_get_counter (); + for (m = 0; m < j; m += 2) + { + if (tmp_array[m] == global_id) + { + found = 1; + tmp_array[m + 1] += call_count; + break; + } + } + if (!found) + { + tmp_array[j] = global_id; + tmp_array[j + 1] = call_count; + j += 2; + } + } + /* Now sort the temp array */ + gcov_sort_n_vals (tmp_array, j); + + /* Now copy back the top half of the temp array */ + for (k = 0; k < GCOV_ICALL_TOPN_NCOUNTS - 1; k += 2) + { + value_array[k] = tmp_array[k]; + value_array[k + 1] = tmp_array[k + 1]; + } + } +} +#endif + + +#ifdef L_gcov_merge_time_profile +/* Time profiles are merged so that minimum from all valid (greater than zero) + is stored. There could be a fork that creates new counters. To have + the profile stable, we chosen to pick the smallest function visit time. */ +void +__gcov_merge_time_profile (gcov_type *counters, unsigned n_counters) +{ + unsigned int i; + gcov_type value; + + for (i = 0; i < n_counters; i++) + { + value = gcov_get_counter_target (); + + if (value && (!counters[i] || value < counters[i])) + counters[i] = value; + } +} +#endif /* L_gcov_merge_time_profile */ + +#ifdef L_gcov_merge_single +/* The profile merging function for choosing the most common value. + It is given an array COUNTERS of N_COUNTERS old counters and it + reads the same number of counters from the gcov file. The counters + are split into 3-tuples where the members of the tuple have + meanings: + + -- the stored candidate on the most common value of the measured entity + -- counter + -- total number of evaluations of the value */ +void +__gcov_merge_single (gcov_type *counters, unsigned n_counters) +{ + unsigned i, n_measures; + gcov_type value, counter, all; + + gcc_assert (!(n_counters % 3)); + n_measures = n_counters / 3; + for (i = 0; i < n_measures; i++, counters += 3) + { + value = gcov_get_counter_target (); + counter = gcov_get_counter (); + all = gcov_get_counter (); + + if (counters[0] == value) + counters[1] += counter; + else if (counter > counters[1]) + { + counters[0] = value; + counters[1] = counter - counters[1]; + } + else + counters[1] -= counter; + counters[2] += all; + } +} +#endif /* L_gcov_merge_single */ + +#ifdef L_gcov_merge_delta +/* The profile merging function for choosing the most common + difference between two consecutive evaluations of the value. It is + given an array COUNTERS of N_COUNTERS old counters and it reads the + same number of counters from the gcov file. The counters are split + into 4-tuples where the members of the tuple have meanings: + + -- the last value of the measured entity + -- the stored candidate on the most common difference + -- counter + -- total number of evaluations of the value */ +void +__gcov_merge_delta (gcov_type *counters, unsigned n_counters) +{ + unsigned i, n_measures; + gcov_type value, counter, all; + + gcc_assert (!(n_counters % 4)); + n_measures = n_counters / 4; + for (i = 0; i < n_measures; i++, counters += 4) + { + /* last = */ gcov_get_counter (); + value = gcov_get_counter_target (); + counter = gcov_get_counter (); + all = gcov_get_counter (); + + if (counters[1] == value) + counters[2] += counter; + else if (counter > counters[2]) + { + counters[1] = value; + counters[2] = counter - counters[2]; + } + else + counters[2] -= counter; + counters[3] += all; + } +} +#endif /* L_gcov_merge_delta */ +#endif /* inhibit_libc */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/libgcov-profiler.c b/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/libgcov-profiler.c new file mode 100644 index 0000000..7552ada --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/libgcov-profiler.c @@ -0,0 +1,477 @@ +/* Routines required for instrumenting a program. */ +/* Compile this one with gcc. */ +/* Copyright (C) 1989-2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +. */ + +#include "libgcov.h" +#if !defined(inhibit_libc) + +#ifdef L_gcov_interval_profiler +/* If VALUE is in interval , then increases the + corresponding counter in COUNTERS. If the VALUE is above or below + the interval, COUNTERS[STEPS] or COUNTERS[STEPS + 1] is increased + instead. */ + +void +__gcov_interval_profiler (gcov_type *counters, gcov_type value, + int start, unsigned steps) +{ + gcov_type delta = value - start; + if (delta < 0) + counters[steps + 1]++; + else if (delta >= steps) + counters[steps]++; + else + counters[delta]++; +} +#endif + +#ifdef L_gcov_pow2_profiler +/* If VALUE is a power of two, COUNTERS[1] is incremented. Otherwise + COUNTERS[0] is incremented. */ + +void +__gcov_pow2_profiler (gcov_type *counters, gcov_type value) +{ + if (value & (value - 1)) + counters[0]++; + else + counters[1]++; +} +#endif + +/* Tries to determine the most common value among its inputs. Checks if the + value stored in COUNTERS[0] matches VALUE. If this is the case, COUNTERS[1] + is incremented. If this is not the case and COUNTERS[1] is not zero, + COUNTERS[1] is decremented. Otherwise COUNTERS[1] is set to one and + VALUE is stored to COUNTERS[0]. This algorithm guarantees that if this + function is called more than 50% of the time with one value, this value + will be in COUNTERS[0] in the end. + + In any case, COUNTERS[2] is incremented. */ + +static inline void +__gcov_one_value_profiler_body (gcov_type *counters, gcov_type value) +{ + if (value == counters[0]) + counters[1]++; + else if (counters[1] == 0) + { + counters[1] = 1; + counters[0] = value; + } + else + counters[1]--; + counters[2]++; +} + +/* Atomic update version of __gcov_one_value_profile_body(). */ +static inline void +__gcov_one_value_profiler_body_atomic (gcov_type *counters, gcov_type value) +{ + if (value == counters[0]) + GCOV_TYPE_ATOMIC_FETCH_ADD_FN (&counters[1], 1, MEMMODEL_RELAXED); + else if (counters[1] == 0) + { + counters[1] = 1; + counters[0] = value; + } + else + GCOV_TYPE_ATOMIC_FETCH_ADD_FN (&counters[1], -1, MEMMODEL_RELAXED); + GCOV_TYPE_ATOMIC_FETCH_ADD_FN (&counters[2], 1, MEMMODEL_RELAXED); +} + + +#ifdef L_gcov_one_value_profiler +void +__gcov_one_value_profiler (gcov_type *counters, gcov_type value) +{ + __gcov_one_value_profiler_body (counters, value); +} + +void +__gcov_one_value_profiler_atomic (gcov_type *counters, gcov_type value) +{ + __gcov_one_value_profiler_body_atomic (counters, value); +} + + +#endif + +#ifdef L_gcov_indirect_call_profiler +/* This function exist only for workaround of binutils bug 14342. + Once this compatibility hack is obsolette, it can be removed. */ + +/* By default, the C++ compiler will use function addresses in the + vtable entries. Setting TARGET_VTABLE_USES_DESCRIPTORS to nonzero + tells the compiler to use function descriptors instead. The value + of this macro says how many words wide the descriptor is (normally 2), + but it may be dependent on target flags. Since we do not have access + to the target flags here we just check to see if it is set and use + that to set VTABLE_USES_DESCRIPTORS to 0 or 1. + + It is assumed that the address of a function descriptor may be treated + as a pointer to a function. */ + +#ifdef TARGET_VTABLE_USES_DESCRIPTORS +#define VTABLE_USES_DESCRIPTORS 1 +#else +#define VTABLE_USES_DESCRIPTORS 0 +#endif + +/* Tries to determine the most common value among its inputs. */ +void +__gcov_indirect_call_profiler (gcov_type* counter, gcov_type value, + void* cur_func, void* callee_func) +{ + /* If the C++ virtual tables contain function descriptors then one + function may have multiple descriptors and we need to dereference + the descriptors to see if they point to the same function. */ + if (cur_func == callee_func + || (VTABLE_USES_DESCRIPTORS && callee_func + && *(void **) cur_func == *(void **) callee_func)) + __gcov_one_value_profiler_body (counter, value); +} + + +/* Atomic update version of __gcov_indirect_call_profiler(). */ +void +__gcov_indirect_call_profiler_atomic (gcov_type* counter, gcov_type value, + void* cur_func, void* callee_func) +{ + if (cur_func == callee_func + || (VTABLE_USES_DESCRIPTORS && callee_func + && *(void **) cur_func == *(void **) callee_func)) + __gcov_one_value_profiler_body_atomic (counter, value); +} + + +#endif +#ifdef L_gcov_indirect_call_profiler_v2 + +/* These two variables are used to actually track caller and callee. Keep + them in TLS memory so races are not common (they are written to often). + The variables are set directly by GCC instrumented code, so declaration + here must match one in tree-profile.c */ + +#if defined(HAVE_CC_TLS) && !defined (USE_EMUTLS) +__thread +#endif +void * __gcov_indirect_call_callee; +#if defined(HAVE_CC_TLS) && !defined (USE_EMUTLS) +__thread +#endif +gcov_type * __gcov_indirect_call_counters; + +/* By default, the C++ compiler will use function addresses in the + vtable entries. Setting TARGET_VTABLE_USES_DESCRIPTORS to nonzero + tells the compiler to use function descriptors instead. The value + of this macro says how many words wide the descriptor is (normally 2), + but it may be dependent on target flags. Since we do not have access + to the target flags here we just check to see if it is set and use + that to set VTABLE_USES_DESCRIPTORS to 0 or 1. + + It is assumed that the address of a function descriptor may be treated + as a pointer to a function. */ + +#ifdef TARGET_VTABLE_USES_DESCRIPTORS +#define VTABLE_USES_DESCRIPTORS 1 +#else +#define VTABLE_USES_DESCRIPTORS 0 +#endif + +/* Tries to determine the most common value among its inputs. */ +void +__gcov_indirect_call_profiler_v2 (gcov_type value, void* cur_func) +{ + /* If the C++ virtual tables contain function descriptors then one + function may have multiple descriptors and we need to dereference + the descriptors to see if they point to the same function. */ + if (cur_func == __gcov_indirect_call_callee + || (VTABLE_USES_DESCRIPTORS && __gcov_indirect_call_callee + && *(void **) cur_func == *(void **) __gcov_indirect_call_callee)) + __gcov_one_value_profiler_body (__gcov_indirect_call_counters, value); +} + +void +__gcov_indirect_call_profiler_atomic_v2 (gcov_type value, void* cur_func) +{ + /* If the C++ virtual tables contain function descriptors then one + function may have multiple descriptors and we need to dereference + the descriptors to see if they point to the same function. */ + if (cur_func == __gcov_indirect_call_callee + || (VTABLE_USES_DESCRIPTORS && __gcov_indirect_call_callee + && *(void **) cur_func == *(void **) __gcov_indirect_call_callee)) + __gcov_one_value_profiler_body_atomic (__gcov_indirect_call_counters, value); +} + +#endif + +/* +#if defined(L_gcov_direct_call_profiler) || defined(L_gcov_indirect_call_topn_profiler) +__attribute__ ((weak)) gcov_unsigned_t __gcov_lipo_sampling_period; +#endif +*/ + +extern gcov_unsigned_t __gcov_lipo_sampling_period; + +#ifdef L_gcov_indirect_call_topn_profiler + +#include "gthr.h" + +#ifdef __GTHREAD_MUTEX_INIT +__thread int in_profiler; +ATTRIBUTE_HIDDEN __gthread_mutex_t __indir_topn_val_mx = __GTHREAD_MUTEX_INIT; +#endif + +/* Tries to keep track the most frequent N values in the counters where + N is specified by parameter TOPN_VAL. To track top N values, 2*N counter + entries are used. + counter[0] --- the accumative count of the number of times one entry in + in the counters gets evicted/replaced due to limited capacity. + When this value reaches a threshold, the bottom N values are + cleared. + counter[1] through counter[2*N] records the top 2*N values collected so far. + Each value is represented by two entries: count[2*i+1] is the ith value, and + count[2*i+2] is the number of times the value is seen. */ + +static void +__gcov_topn_value_profiler_body (gcov_type *counters, gcov_type value, + gcov_unsigned_t topn_val) +{ + unsigned i, found = 0, have_zero_count = 0; + + gcov_type *entry; + gcov_type *lfu_entry = &counters[1]; + gcov_type *value_array = &counters[1]; + gcov_type *num_eviction = &counters[0]; + + /* There are 2*topn_val values tracked, each value takes two slots in the + counter array */ +#ifdef __GTHREAD_MUTEX_INIT + /* If this is reentry, return. */ + if (in_profiler == 1) + return; + + in_profiler = 1; + __gthread_mutex_lock (&__indir_topn_val_mx); +#endif + for (i = 0; i < topn_val << 2; i += 2) + { + entry = &value_array[i]; + if (entry[0] == value) + { + entry[1]++ ; + found = 1; + break; + } + else if (entry[1] == 0) + { + lfu_entry = entry; + have_zero_count = 1; + } + else if (entry[1] < lfu_entry[1]) + lfu_entry = entry; + } + + if (found) + { + in_profiler = 0; +#ifdef __GTHREAD_MUTEX_INIT + __gthread_mutex_unlock (&__indir_topn_val_mx); +#endif + return; + } + + /* lfu_entry is either an empty entry or an entry + with lowest count, which will be evicted. */ + lfu_entry[0] = value; + lfu_entry[1] = 1; + +#define GCOV_ICALL_COUNTER_CLEAR_THRESHOLD 3000 + + /* Too many evictions -- time to clear bottom entries to + avoid hot values bumping each other out. */ + if (!have_zero_count + && ++*num_eviction >= GCOV_ICALL_COUNTER_CLEAR_THRESHOLD) + { + unsigned i, j; + gcov_type **p; + gcov_type **tmp_cnts + = (gcov_type **)alloca (topn_val * sizeof(gcov_type *)); + + *num_eviction = 0; + + /* Find the largest topn_val values from the group of + 2*topn_val values and put the addresses into tmp_cnts. */ + for (i = 0; i < topn_val; i++) + tmp_cnts[i] = &value_array[i * 2 + 1]; + + for (i = topn_val * 2; i < topn_val << 2; i += 2) + { + p = &tmp_cnts[0]; + for (j = 1; j < topn_val; j++) + if (*tmp_cnts[j] > **p) + p = &tmp_cnts[j]; + if (value_array[i + 1] < **p) + *p = &value_array[i + 1]; + } + + /* Zero out low value entries. */ + for (i = 0; i < topn_val; i++) + { + *tmp_cnts[i] = 0; + *(tmp_cnts[i] - 1) = 0; + } + } + +#ifdef __GTHREAD_MUTEX_INIT + in_profiler = 0; + __gthread_mutex_unlock (&__indir_topn_val_mx); +#endif +} + +#if defined(HAVE_CC_TLS) && !defined (USE_EMUTLS) +__thread +#endif +gcov_type *__gcov_indirect_call_topn_counters ATTRIBUTE_HIDDEN; + +#if defined(HAVE_CC_TLS) && !defined (USE_EMUTLS) +__thread +#endif +void *__gcov_indirect_call_topn_callee ATTRIBUTE_HIDDEN; + +#if defined(HAVE_CC_TLS) && !defined (USE_EMUTLS) +__thread +#endif +gcov_unsigned_t __gcov_indirect_call_sampling_counter ATTRIBUTE_HIDDEN; + +#ifdef TARGET_VTABLE_USES_DESCRIPTORS +#define VTABLE_USES_DESCRIPTORS 1 +#else +#define VTABLE_USES_DESCRIPTORS 0 +#endif +void +__gcov_indirect_call_topn_profiler (void *cur_func, + void *cur_module_gcov_info, + gcov_unsigned_t cur_func_id) +{ + void *callee_func = __gcov_indirect_call_topn_callee; + gcov_type *counter = __gcov_indirect_call_topn_counters; + /* If the C++ virtual tables contain function descriptors then one + function may have multiple descriptors and we need to dereference + the descriptors to see if they point to the same function. */ + if (cur_func == callee_func + || (VTABLE_USES_DESCRIPTORS && callee_func + && *(void **) cur_func == *(void **) callee_func)) + { + if (++__gcov_indirect_call_sampling_counter >= __gcov_lipo_sampling_period) + { + __gcov_indirect_call_sampling_counter = 0; + gcov_type global_id + = ((struct gcov_info *) cur_module_gcov_info)->mod_info->ident; + global_id = GEN_FUNC_GLOBAL_ID (global_id, cur_func_id); + __gcov_topn_value_profiler_body (counter, global_id, GCOV_ICALL_TOPN_VAL); + } + __gcov_indirect_call_topn_callee = 0; + } +} + +#endif + +#ifdef L_gcov_direct_call_profiler +#if defined(HAVE_CC_TLS) && !defined (USE_EMUTLS) +__thread +#endif +gcov_type *__gcov_direct_call_counters ATTRIBUTE_HIDDEN; +#if defined(HAVE_CC_TLS) && !defined (USE_EMUTLS) +__thread +#endif +void *__gcov_direct_call_callee ATTRIBUTE_HIDDEN; +#if defined(HAVE_CC_TLS) && !defined (USE_EMUTLS) +__thread +#endif +gcov_unsigned_t __gcov_direct_call_sampling_counter ATTRIBUTE_HIDDEN; + +/* Direct call profiler. */ + +void +__gcov_direct_call_profiler (void *cur_func, + void *cur_module_gcov_info, + gcov_unsigned_t cur_func_id) +{ + if (cur_func == __gcov_direct_call_callee) + { + if (++__gcov_direct_call_sampling_counter >= __gcov_lipo_sampling_period) + { + __gcov_direct_call_sampling_counter = 0; + gcov_type global_id + = ((struct gcov_info *) cur_module_gcov_info)->mod_info->ident; + global_id = GEN_FUNC_GLOBAL_ID (global_id, cur_func_id); + __gcov_direct_call_counters[0] = global_id; + __gcov_direct_call_counters[1]++; + } + __gcov_direct_call_callee = 0; + } +} +#endif + + +#ifdef L_gcov_time_profiler + +/* Counter for first visit of each function. */ +static gcov_type function_counter; + +/* Sets corresponding COUNTERS if there is no value. */ + +void +__gcov_time_profiler (gcov_type* counters) +{ + if (!counters[0]) + counters[0] = ++function_counter; +} +#endif + +#ifdef L_gcov_average_profiler +/* Increase corresponding COUNTER by VALUE. FIXME: Perhaps we want + to saturate up. */ + +void +__gcov_average_profiler (gcov_type *counters, gcov_type value) +{ + counters[0] += value; + counters[1] ++; +} +#endif + +#ifdef L_gcov_ior_profiler +/* Bitwise-OR VALUE into COUNTER. */ + +void +__gcov_ior_profiler (gcov_type *counters, gcov_type value) +{ + *counters |= value; +} +#endif + +#endif /* inhibit_libc */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/libgcov.h b/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/libgcov.h new file mode 100644 index 0000000..c1ebe6e --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/gcov-src/libgcov.h @@ -0,0 +1,421 @@ +/* Header file for libgcov-*.c. + Copyright (C) 1996-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it under + the terms of the GNU General Public License as published by the Free + Software Foundation; either version 3, or (at your option) any later + version. + + GCC is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or + FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License + for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef GCC_LIBGCOV_H +#define GCC_LIBGCOV_H + +#ifndef __KERNEL__ +/* work around the poisoned malloc/calloc in system.h. */ +#ifndef xmalloc +#define xmalloc malloc +#endif +#ifndef xcalloc +#define xcalloc calloc +#endif +#ifndef xrealloc +#define xrealloc realloc +#endif +#ifndef xfree +#define xfree free +#endif +#else /* __KERNEL__ */ +#include "libgcov-kernel.h" +#endif /* __KERNEL__ */ + +#ifndef IN_GCOV_TOOL +/* About the target. */ +/* This path will be used by libgcov runtime. */ + +#ifndef __KERNEL__ +#include "tconfig.h" +#include "tsystem.h" +#include "coretypes.h" +#include "tm.h" +#include "libgcc_tm.h" +#endif /* __KERNEL__ */ + +#undef FUNC_ID_WIDTH +#undef FUNC_ID_MASK + +#if BITS_PER_UNIT == 8 +typedef unsigned gcov_unsigned_t __attribute__ ((mode (SI))); +typedef unsigned gcov_position_t __attribute__ ((mode (SI))); +#if LONG_LONG_TYPE_SIZE > 32 +typedef signed gcov_type __attribute__ ((mode (DI))); +typedef unsigned gcov_type_unsigned __attribute__ ((mode (DI))); +#define FUNC_ID_WIDTH 32 +#define FUNC_ID_MASK ((1ll << FUNC_ID_WIDTH) - 1) +#else +typedef signed gcov_type __attribute__ ((mode (SI))); +typedef unsigned gcov_type_unsigned __attribute__ ((mode (SI))); +#define FUNC_ID_WIDTH 16 +#define FUNC_ID_MASK ((1 << FUNC_ID_WIDTH) - 1) +#endif +#else /* BITS_PER_UNIT != 8 */ +#if BITS_PER_UNIT == 16 +typedef unsigned gcov_unsigned_t __attribute__ ((mode (HI))); +typedef unsigned gcov_position_t __attribute__ ((mode (HI))); +#if LONG_LONG_TYPE_SIZE > 32 +typedef signed gcov_type __attribute__ ((mode (SI))); +typedef unsigned gcov_type_unsigned __attribute__ ((mode (SI))); +#define FUNC_ID_WIDTH 32 +#define FUNC_ID_MASK ((1ll << FUNC_ID_WIDTH) - 1) +#else +typedef signed gcov_type __attribute__ ((mode (HI))); +typedef unsigned gcov_type_unsigned __attribute__ ((mode (HI))); +#define FUNC_ID_WIDTH 16 +#define FUNC_ID_MASK ((1 << FUNC_ID_WIDTH) - 1) +#endif +#else /* BITS_PER_UNIT != 16 */ +typedef unsigned gcov_unsigned_t __attribute__ ((mode (QI))); +typedef unsigned gcov_position_t __attribute__ ((mode (QI))); +#if LONG_LONG_TYPE_SIZE > 32 +typedef signed gcov_type __attribute__ ((mode (HI))); +typedef unsigned gcov_type_unsigned __attribute__ ((mode (HI))); +#define FUNC_ID_WIDTH 32 +#define FUNC_ID_MASK ((1ll << FUNC_ID_WIDTH) - 1) +#else +typedef signed gcov_type __attribute__ ((mode (QI))); +typedef unsigned gcov_type_unsigned __attribute__ ((mode (QI))); +#define FUNC_ID_WIDTH 16 +#define FUNC_ID_MASK ((1 << FUNC_ID_WIDTH) - 1) +#endif +#endif /* BITS_PER_UNIT == 16 */ +#endif /* BITS_PER_UNIT == 8 */ + +#if LONG_LONG_TYPE_SIZE > 32 +#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_8 +#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_8 +#else +#define GCOV_TYPE_ATOMIC_FETCH_ADD_FN __atomic_fetch_add_4 +#define GCOV_TYPE_ATOMIC_FETCH_ADD BUILT_IN_ATOMIC_FETCH_ADD_4 +#endif + +#if defined (TARGET_POSIX_IO) +#define GCOV_LOCKED 1 +#else +#define GCOV_LOCKED 0 +#endif + +#else /* IN_GCOV_TOOL */ +/* About the host. */ +/* This path will be compiled for the host and linked into + gcov-tool binary. */ + +#include "config.h" +#include "system.h" +#include "coretypes.h" +#include "tm.h" + +typedef unsigned gcov_unsigned_t; +typedef unsigned gcov_position_t; +/* gcov_type is typedef'd elsewhere for the compiler */ +#if defined (HOST_HAS_F_SETLKW) +#define GCOV_LOCKED 1 +#else +#define GCOV_LOCKED 0 +#endif + +#define FUNC_ID_WIDTH 32 +#define FUNC_ID_MASK ((1ll << FUNC_ID_WIDTH) - 1) + +/* Some Macros specific to gcov-tool. */ + +#define L_gcov 1 +#define L_gcov_merge_add 1 +#define L_gcov_merge_single 1 +#define L_gcov_merge_delta 1 +#define L_gcov_merge_ior 1 +#define L_gcov_merge_time_profile 1 +#define L_gcov_merge_icall_topn 1 +#define L_gcov_merge_dc 1 + +/* Make certian internal functions/variables in libgcov available for + gcov-tool access. */ +#define GCOV_TOOL_LINKAGE + +extern gcov_type gcov_read_counter_mem (); +extern unsigned gcov_get_merge_weight (); + +#endif /* !IN_GCOV_TOOL */ + +#undef EXTRACT_MODULE_ID_FROM_GLOBAL_ID +#undef EXTRACT_FUNC_ID_FROM_GLOBAL_ID +#undef GEN_FUNC_GLOBAL_ID +#define EXTRACT_MODULE_ID_FROM_GLOBAL_ID(gid) \ + (gcov_unsigned_t)(((gid) >> FUNC_ID_WIDTH) & FUNC_ID_MASK) +#define EXTRACT_FUNC_ID_FROM_GLOBAL_ID(gid) \ + (gcov_unsigned_t)((gid) & FUNC_ID_MASK) +#define GEN_FUNC_GLOBAL_ID(m,f) ((((gcov_type) (m)) << FUNC_ID_WIDTH) | (f)) + +#if defined(inhibit_libc) +#define IN_LIBGCOV (-1) +#else +#define IN_LIBGCOV 1 +#if defined(L_gcov) +#define GCOV_LINKAGE /* nothing */ +#endif +#endif + +/* In libgcov we need these functions to be extern, so prefix them with + __gcov. In libgcov they must also be hidden so that the instance in + the executable is not also used in a DSO. */ +#define gcov_var __gcov_var +#define gcov_open __gcov_open +#define gcov_close __gcov_close +#define gcov_write_tag_length __gcov_write_tag_length +#define gcov_position __gcov_position +#define gcov_seek __gcov_seek +#define gcov_rewrite __gcov_rewrite +#define gcov_truncate __gcov_truncate +#define gcov_is_error __gcov_is_error +#define gcov_write_unsigned __gcov_write_unsigned +#define gcov_write_counter __gcov_write_counter +#define gcov_write_summary __gcov_write_summary +#define gcov_write_module_info __gcov_write_module_info +#define gcov_read_unsigned __gcov_read_unsigned +#define gcov_read_counter __gcov_read_counter +#define gcov_read_summary __gcov_read_summary +#define gcov_read_buildinfo __gcov_read_buildinfo +#define gcov_read_module_info __gcov_read_module_info +#define gcov_sort_n_vals __gcov_sort_n_vals + +/* Poison these, so they don't accidentally slip in. */ +#pragma GCC poison gcov_write_string gcov_write_tag gcov_write_length +#pragma GCC poison gcov_time gcov_magic + +#ifdef HAVE_GAS_HIDDEN +#define ATTRIBUTE_HIDDEN __attribute__ ((__visibility__ ("hidden"))) +#else +#define ATTRIBUTE_HIDDEN +#endif + +#include "gcov-io.h" + +/* Structures embedded in coveraged program. The structures generated + by write_profile must match these. */ +/* Information about counters for a single function. */ +struct gcov_ctr_info +{ + gcov_unsigned_t num; /* number of counters. */ + gcov_type *values; /* their values. */ +}; + +/* Information about a single function. This uses the trailing array + idiom. The number of counters is determined from the merge pointer + array in gcov_info. The key is used to detect which of a set of + comdat functions was selected -- it points to the gcov_info object + of the object file containing the selected comdat function. */ + +struct gcov_fn_info +{ + const struct gcov_info *key; /* comdat key */ + gcov_unsigned_t ident; /* unique ident of function */ + gcov_unsigned_t lineno_checksum; /* function lineo_checksum */ + gcov_unsigned_t cfg_checksum; /* function cfg checksum */ + struct gcov_ctr_info ctrs[1]; /* instrumented counters */ +}; + +/* Type of function used to merge counters. */ +typedef void (*gcov_merge_fn) (gcov_type *, gcov_unsigned_t); + +/* Information about a single object file. */ +struct gcov_info +{ + gcov_unsigned_t version; /* expected version number */ + struct gcov_module_info *mod_info; /* addtional module info. */ + struct gcov_info *next; /* link to next, used by libgcov */ + + gcov_unsigned_t stamp; /* uniquifying time stamp */ + const char *filename; /* output file name */ + gcov_unsigned_t eof_pos; /* end position of profile data */ + gcov_merge_fn merge[GCOV_COUNTERS]; /* merge functions (null for + unused) */ + + unsigned n_functions; /* number of functions */ + +#if !defined (IN_GCOV_TOOL) && !defined (__KERNEL__) + const struct gcov_fn_info *const *functions; /* pointer to pointers + to function information */ +#elif defined (IN_GCOV_TOOL) + const struct gcov_fn_info **functions; +#else + struct gcov_fn_info **functions; +#endif /* !IN_GCOV_TOOL */ + char **build_info; /* strings to include in BUILD_INFO + section of gcda file. */ +}; + +/* Information about a single imported module. */ +struct dyn_imp_mod +{ + const struct gcov_info *imp_mod; + double weight; +}; + +/* Register a new object file module. */ +extern void __gcov_init (struct gcov_info *) ATTRIBUTE_HIDDEN; + +/* Set sampling rate to RATE. */ +extern void __gcov_set_sampling_rate (unsigned int rate); + +/* Called before fork, to avoid double counting. */ +extern void __gcov_flush (void) ATTRIBUTE_HIDDEN; + +/* Function to reset all counters to 0. */ +extern void __gcov_reset (void); +/* Function to enable early write of profile information so far. + __gcov_dump is also used by __gcov_dump_all. The latter + depends on __GCOV_DUMP to have hidden or protected visibility + so that each library has its own copy of the registered dumper. */ +extern void __gcov_dump (void) ATTRIBUTE_HIDDEN; + +/* Call __gcov_dump registered from each shared library. + This function must have default visibility. */ +void __gcov_dump_all (void); + +/* The merge function that just sums the counters. */ +extern void __gcov_merge_add (gcov_type *, unsigned) ATTRIBUTE_HIDDEN; + +/* The merge function to choose the most common value. */ +extern void __gcov_merge_single (gcov_type *, unsigned) ATTRIBUTE_HIDDEN; + +/* The merge function to choose the most common difference between + consecutive values. */ +extern void __gcov_merge_delta (gcov_type *, unsigned) ATTRIBUTE_HIDDEN; + +/* The merge function that just ors the counters together. */ +extern void __gcov_merge_ior (gcov_type *, unsigned) ATTRIBUTE_HIDDEN; + +/* The merge function used for direct call counters. */ +extern void __gcov_merge_dc (gcov_type *, unsigned) ATTRIBUTE_HIDDEN; + +/* The merge function used for indirect call counters. */ +extern void __gcov_merge_icall_topn (gcov_type *, unsigned) ATTRIBUTE_HIDDEN; + +extern void __gcov_merge_time_profile (gcov_type *, unsigned) ATTRIBUTE_HIDDEN; + +/* The profiler functions. */ +extern void __gcov_interval_profiler (gcov_type *, gcov_type, int, unsigned); +extern void __gcov_pow2_profiler (gcov_type *, gcov_type); +extern void __gcov_one_value_profiler (gcov_type *, gcov_type); +extern void __gcov_indirect_call_profiler (gcov_type*, gcov_type, + void*, void*); +extern void __gcov_indirect_call_profiler_v2 (gcov_type, void *); +extern void __gcov_indirect_call_topn_profiler (void *, void *, gcov_unsigned_t) ATTRIBUTE_HIDDEN; +extern void __gcov_direct_call_profiler (void *, void *, gcov_unsigned_t) ATTRIBUTE_HIDDEN; +extern void __gcov_average_profiler (gcov_type *, gcov_type); +extern void __gcov_ior_profiler (gcov_type *, gcov_type); +extern void __gcov_sort_n_vals (gcov_type *value_array, int n); +extern void __gcov_time_profiler (gcov_type *); + +#ifndef inhibit_libc +/* The wrappers around some library functions.. */ +extern pid_t __gcov_fork (void) ATTRIBUTE_HIDDEN; +extern int __gcov_execl (const char *, char *, ...) ATTRIBUTE_HIDDEN; +extern int __gcov_execlp (const char *, char *, ...) ATTRIBUTE_HIDDEN; +extern int __gcov_execle (const char *, char *, ...) ATTRIBUTE_HIDDEN; +extern int __gcov_execv (const char *, char *const []) ATTRIBUTE_HIDDEN; +extern int __gcov_execvp (const char *, char *const []) ATTRIBUTE_HIDDEN; +extern int __gcov_execve (const char *, char *const [], char *const []) + ATTRIBUTE_HIDDEN; + + +/* Functions that only available in libgcov. */ +GCOV_LINKAGE int gcov_open (const char */*name*/) ATTRIBUTE_HIDDEN; +GCOV_LINKAGE void gcov_write_counter (gcov_type) ATTRIBUTE_HIDDEN; +GCOV_LINKAGE void gcov_write_tag_length (gcov_unsigned_t, gcov_unsigned_t) + ATTRIBUTE_HIDDEN; +GCOV_LINKAGE void gcov_write_summary (gcov_unsigned_t /*tag*/, + const struct gcov_summary *) + ATTRIBUTE_HIDDEN; +GCOV_LINKAGE void gcov_seek (gcov_position_t /*position*/) ATTRIBUTE_HIDDEN; +GCOV_LINKAGE void gcov_truncate (void) ATTRIBUTE_HIDDEN; +void gcov_write_module_info (const struct gcov_info *, unsigned) + ATTRIBUTE_HIDDEN; +GCOV_LINKAGE void gcov_write_module_infos (struct gcov_info *mod_info) + ATTRIBUTE_HIDDEN; +GCOV_LINKAGE const struct dyn_imp_mod ** +gcov_get_sorted_import_module_array (struct gcov_info *mod_info, unsigned *len) + ATTRIBUTE_HIDDEN; +GCOV_LINKAGE inline void gcov_rewrite (void); + +extern void set_gcov_fn_fixed_up (int fixed_up); +extern int get_gcov_fn_fixed_up (void); + +/* "Counts" stored in gcda files can be a real counter value, or + an target address. When differentiate these two types because + when manipulating counts, we should only change real counter values, + rather target addresses. */ + +static inline gcov_type +gcov_get_counter (void) +{ +#ifndef IN_GCOV_TOOL + /* This version is for reading count values in libgcov runtime: + we read from gcda files. */ + + if (get_gcov_fn_fixed_up ()) + { + gcov_read_counter (); + return 0; + } + else + return gcov_read_counter (); +#else + /* This version is for gcov-tool. We read the value from memory and + multiply it by the merge weight. */ + + return gcov_read_counter_mem () * gcov_get_merge_weight (); +#endif +} + +/* Similar function as gcov_get_counter(), but handles target address + counters. */ + +static inline gcov_type +gcov_get_counter_target (void) +{ +#ifndef IN_GCOV_TOOL + /* This version is for reading count target values in libgcov runtime: + we read from gcda files. */ + + if (get_gcov_fn_fixed_up ()) + { + gcov_read_counter (); + return 0; + } + else + return gcov_read_counter (); +#else + /* This version is for gcov-tool. We read the value from memory and we do NOT + multiply it by the merge weight. */ + + return gcov_read_counter_mem (); +#endif +} + +#endif /* !inhibit_libc */ + +#endif /* GCC_LIBGCOV_H */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include-fixed/README b/lib/gcc/x86_64-linux-android/4.9.x/include-fixed/README new file mode 100644 index 0000000..7086a77 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include-fixed/README @@ -0,0 +1,14 @@ +This README file is copied into the directory for GCC-only header files +when fixincludes is run by the makefile for GCC. + +Many of the files in this directory were automatically edited from the +standard system header files by the fixincludes process. They are +system-specific, and will not work on any other kind of system. They +are also not part of GCC. The reason we have to do this is because +GCC requires ANSI C headers and many vendors supply ANSI-incompatible +headers. + +Because this is an automated process, sometimes headers get "fixed" +that do not, strictly speaking, need a fix. As long as nothing is broken +by the process, it is just an unfortunate collateral inconvenience. +We would like to rectify it, if it is not "too inconvenient". diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include-fixed/limits.h b/lib/gcc/x86_64-linux-android/4.9.x/include-fixed/limits.h new file mode 100644 index 0000000..8c6a4d3 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include-fixed/limits.h @@ -0,0 +1,171 @@ +/* Copyright (C) 1992-2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +. */ + +/* This administrivia gets added to the beginning of limits.h + if the system has its own version of limits.h. */ + +/* We use _GCC_LIMITS_H_ because we want this not to match + any macros that the system's limits.h uses for its own purposes. */ +#ifndef _GCC_LIMITS_H_ /* Terminated in limity.h. */ +#define _GCC_LIMITS_H_ + +#ifndef _LIBC_LIMITS_H_ +/* Use "..." so that we find syslimits.h only in this same directory. */ +#include "syslimits.h" +#endif +/* Copyright (C) 1991-2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +. */ + +#ifndef _LIMITS_H___ +#define _LIMITS_H___ + +/* Number of bits in a `char'. */ +#undef CHAR_BIT +#define CHAR_BIT __CHAR_BIT__ + +/* Maximum length of a multibyte character. */ +#ifndef MB_LEN_MAX +#define MB_LEN_MAX 1 +#endif + +/* Minimum and maximum values a `signed char' can hold. */ +#undef SCHAR_MIN +#define SCHAR_MIN (-SCHAR_MAX - 1) +#undef SCHAR_MAX +#define SCHAR_MAX __SCHAR_MAX__ + +/* Maximum value an `unsigned char' can hold. (Minimum is 0). */ +#undef UCHAR_MAX +#if __SCHAR_MAX__ == __INT_MAX__ +# define UCHAR_MAX (SCHAR_MAX * 2U + 1U) +#else +# define UCHAR_MAX (SCHAR_MAX * 2 + 1) +#endif + +/* Minimum and maximum values a `char' can hold. */ +#ifdef __CHAR_UNSIGNED__ +# undef CHAR_MIN +# if __SCHAR_MAX__ == __INT_MAX__ +# define CHAR_MIN 0U +# else +# define CHAR_MIN 0 +# endif +# undef CHAR_MAX +# define CHAR_MAX UCHAR_MAX +#else +# undef CHAR_MIN +# define CHAR_MIN SCHAR_MIN +# undef CHAR_MAX +# define CHAR_MAX SCHAR_MAX +#endif + +/* Minimum and maximum values a `signed short int' can hold. */ +#undef SHRT_MIN +#define SHRT_MIN (-SHRT_MAX - 1) +#undef SHRT_MAX +#define SHRT_MAX __SHRT_MAX__ + +/* Maximum value an `unsigned short int' can hold. (Minimum is 0). */ +#undef USHRT_MAX +#if __SHRT_MAX__ == __INT_MAX__ +# define USHRT_MAX (SHRT_MAX * 2U + 1U) +#else +# define USHRT_MAX (SHRT_MAX * 2 + 1) +#endif + +/* Minimum and maximum values a `signed int' can hold. */ +#undef INT_MIN +#define INT_MIN (-INT_MAX - 1) +#undef INT_MAX +#define INT_MAX __INT_MAX__ + +/* Maximum value an `unsigned int' can hold. (Minimum is 0). */ +#undef UINT_MAX +#define UINT_MAX (INT_MAX * 2U + 1U) + +/* Minimum and maximum values a `signed long int' can hold. + (Same as `int'). */ +#undef LONG_MIN +#define LONG_MIN (-LONG_MAX - 1L) +#undef LONG_MAX +#define LONG_MAX __LONG_MAX__ + +/* Maximum value an `unsigned long int' can hold. (Minimum is 0). */ +#undef ULONG_MAX +#define ULONG_MAX (LONG_MAX * 2UL + 1UL) + +#if defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L +/* Minimum and maximum values a `signed long long int' can hold. */ +# undef LLONG_MIN +# define LLONG_MIN (-LLONG_MAX - 1LL) +# undef LLONG_MAX +# define LLONG_MAX __LONG_LONG_MAX__ + +/* Maximum value an `unsigned long long int' can hold. (Minimum is 0). */ +# undef ULLONG_MAX +# define ULLONG_MAX (LLONG_MAX * 2ULL + 1ULL) +#endif + +#if defined (__GNU_LIBRARY__) ? defined (__USE_GNU) : !defined (__STRICT_ANSI__) +/* Minimum and maximum values a `signed long long int' can hold. */ +# undef LONG_LONG_MIN +# define LONG_LONG_MIN (-LONG_LONG_MAX - 1LL) +# undef LONG_LONG_MAX +# define LONG_LONG_MAX __LONG_LONG_MAX__ + +/* Maximum value an `unsigned long long int' can hold. (Minimum is 0). */ +# undef ULONG_LONG_MAX +# define ULONG_LONG_MAX (LONG_LONG_MAX * 2ULL + 1ULL) +#endif + +#endif /* _LIMITS_H___ */ +/* This administrivia gets added to the end of limits.h + if the system has its own version of limits.h. */ + +#else /* not _GCC_LIMITS_H_ */ + +#ifdef _GCC_NEXT_LIMITS_H +#include_next /* recurse down to the real one */ +#endif + +#endif /* not _GCC_LIMITS_H_ */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include-fixed/linux/a.out.h b/lib/gcc/x86_64-linux-android/4.9.x/include-fixed/linux/a.out.h new file mode 100644 index 0000000..4040bf4 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include-fixed/linux/a.out.h @@ -0,0 +1,235 @@ +/* DO NOT EDIT THIS FILE. + + It has been auto-edited by fixincludes from: + + "/tmp/b71c5d2bcd13ef366b305ad71071ded6/sysroot/usr/include/linux/a.out.h" + + This had to be done to correct non-standard usages in the + original, manufacturer supplied header file. */ + +/**************************************************************************** + **************************************************************************** + *** + *** This header was automatically generated from a Linux kernel header + *** of the same name, to make information necessary for userspace to + *** call into the kernel available to libc. It contains only constants, + *** structures, and macros generated from the original header, and thus, + *** contains no copyrightable information. + *** + *** To edit the content of this header, modify the corresponding + *** source file (e.g. under external/kernel-headers/original/) then + *** run bionic/libc/kernel/tools/update_all.py + *** + *** Any manual change here will be lost the next time this script will + *** be run. You've been warned! + *** + **************************************************************************** + ****************************************************************************/ +#ifndef _UAPI__A_OUT_GNU_H__ +#define _UAPI__A_OUT_GNU_H__ +#define __GNU_EXEC_MACROS__ +#ifndef __STRUCT_EXEC_OVERRIDE__ +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#include +#endif +#ifndef __ASSEMBLY__ +enum machine_type { +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#ifdef M_OLDSUN2 + M__OLDSUN2 = M_OLDSUN2, +#else + M_OLDSUN2 = 0, +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#endif +#ifdef M_68010 + M__68010 = M_68010, +#else +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ + M_68010 = 1, +#endif +#ifdef M_68020 + M__68020 = M_68020, +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#else + M_68020 = 2, +#endif +#ifdef M_SPARC +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ + M__SPARC = M_SPARC, +#else + M_SPARC = 3, +#endif +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ + M_386 = 100, + M_MIPS1 = 151, + M_MIPS2 = 152 +}; +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#ifndef N_MAGIC +#define N_MAGIC(exec) ((exec).a_info & 0xffff) +#endif +#define N_MACHTYPE(exec) ((enum machine_type)(((exec).a_info >> 16) & 0xff)) +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#define N_FLAGS(exec) (((exec).a_info >> 24) & 0xff) +#define N_SET_INFO(exec, magic, type, flags) ((exec).a_info = ((magic) & 0xffff) | (((int)(type) & 0xff) << 16) | (((flags) & 0xff) << 24)) +#define N_SET_MAGIC(exec, magic) ((exec).a_info = (((exec).a_info & 0xffff0000) | ((magic) & 0xffff))) +#define N_SET_MACHTYPE(exec, machtype) ((exec).a_info = ((exec).a_info&0xff00ffff) | ((((int)(machtype))&0xff) << 16)) +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#define N_SET_FLAGS(exec, flags) ((exec).a_info = ((exec).a_info&0x00ffffff) | (((flags) & 0xff) << 24)) +#define OMAGIC 0407 +#define NMAGIC 0410 +#define ZMAGIC 0413 +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#define QMAGIC 0314 +#define CMAGIC 0421 +#ifndef N_BADMAG +#define N_BADMAG(x) (N_MAGIC(x) != OMAGIC && N_MAGIC(x) != NMAGIC && N_MAGIC(x) != ZMAGIC && N_MAGIC(x) != QMAGIC) +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#endif +#define _N_HDROFF(x) (1024 - sizeof (struct exec)) +#ifndef N_TXTOFF +#define N_TXTOFF(x) (N_MAGIC(x) == ZMAGIC ? _N_HDROFF((x)) + sizeof (struct exec) : (N_MAGIC(x) == QMAGIC ? 0 : sizeof (struct exec))) +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#endif +#ifndef N_DATOFF +#define N_DATOFF(x) (N_TXTOFF(x) + (x).a_text) +#endif +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#ifndef N_TRELOFF +#define N_TRELOFF(x) (N_DATOFF(x) + (x).a_data) +#endif +#ifndef N_DRELOFF +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#define N_DRELOFF(x) (N_TRELOFF(x) + N_TRSIZE(x)) +#endif +#ifndef N_SYMOFF +#define N_SYMOFF(x) (N_DRELOFF(x) + N_DRSIZE(x)) +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#endif +#ifndef N_STROFF +#define N_STROFF(x) (N_SYMOFF(x) + N_SYMSIZE(x)) +#endif +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#ifndef N_TXTADDR +#define N_TXTADDR(x) (N_MAGIC(x) == QMAGIC ? PAGE_SIZE : 0) +#endif +#if defined(vax) || defined(hp300) || defined(pyr) +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#define SEGMENT_SIZE page_size +#endif +#ifdef sony +#define SEGMENT_SIZE 0x2000 +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#endif +#ifdef is68k +#define SEGMENT_SIZE 0x20000 +#endif +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#if defined(m68k) && defined(PORTAR) +#define PAGE_SIZE 0x400 +#define SEGMENT_SIZE PAGE_SIZE +#endif +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#ifdef __linux__ +#include +#if defined(__i386__) || defined(__mc68000__) +#define SEGMENT_SIZE 1024 +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#else +#ifndef SEGMENT_SIZE +#define SEGMENT_SIZE getpagesize() +#endif +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#endif +#endif +#define _N_SEGMENT_ROUND(x) ALIGN(x, SEGMENT_SIZE) +#define _N_TXTENDADDR(x) (N_TXTADDR(x)+(x).a_text) +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#ifndef N_DATADDR +#define N_DATADDR(x) (N_MAGIC(x)==OMAGIC? (_N_TXTENDADDR(x)) : (_N_SEGMENT_ROUND (_N_TXTENDADDR(x)))) +#endif +#ifndef N_BSSADDR +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#define N_BSSADDR(x) (N_DATADDR(x) + (x).a_data) +#endif +#ifndef N_NLIST_DECLARED +struct nlist { +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ + union { + char *n_name; + struct nlist *n_next; + long n_strx; +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ + } n_un; + unsigned char n_type; + char n_other; + short n_desc; +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ + unsigned long n_value; +}; +#endif +#ifndef N_UNDF +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#define N_UNDF 0 +#endif +#ifndef N_ABS +#define N_ABS 2 +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#endif +#ifndef N_TEXT +#define N_TEXT 4 +#endif +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#ifndef N_DATA +#define N_DATA 6 +#endif +#ifndef N_BSS +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#define N_BSS 8 +#endif +#ifndef N_FN +#define N_FN 15 +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#endif +#ifndef N_EXT +#define N_EXT 1 +#endif +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#ifndef N_TYPE +#define N_TYPE 036 +#endif +#ifndef N_STAB +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#define N_STAB 0340 +#endif +#define N_INDR 0xa +#define N_SETA 0x14 +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#define N_SETT 0x16 +#define N_SETD 0x18 +#define N_SETB 0x1A +#define N_SETV 0x1C +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#ifndef N_RELOCATION_INFO_DECLARED +struct relocation_info +{ + int r_address; +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ + unsigned int r_symbolnum:24; + unsigned int r_pcrel:1; + unsigned int r_length:2; + unsigned int r_extern:1; +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#ifdef NS32K + unsigned r_bsr:1; + unsigned r_disp:1; + unsigned r_pad:2; +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#else + unsigned int r_pad:4; +#endif +}; +/* WARNING: DO NOT EDIT, AUTO-GENERATED CODE - SEE TOP FOR INSTRUCTIONS */ +#endif +#endif +#endif diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include-fixed/syslimits.h b/lib/gcc/x86_64-linux-android/4.9.x/include-fixed/syslimits.h new file mode 100644 index 0000000..a362802 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include-fixed/syslimits.h @@ -0,0 +1,8 @@ +/* syslimits.h stands for the system's own limits.h file. + If we can use it ok unmodified, then we install this text. + If fixincludes fixes it, then the fixed version is installed + instead of this text. */ + +#define _GCC_NEXT_LIMITS_H /* tell gcc's limits.h to recurse */ +#include_next +#undef _GCC_NEXT_LIMITS_H diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/adxintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/adxintrin.h new file mode 100644 index 0000000..6118900 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/adxintrin.h @@ -0,0 +1,49 @@ +/* Copyright (C) 2012-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#if !defined _X86INTRIN_H_INCLUDED && !defined _IMMINTRIN_H_INCLUDED +# error "Never use directly; include instead." +#endif + +#ifndef _ADXINTRIN_H_INCLUDED +#define _ADXINTRIN_H_INCLUDED + +extern __inline unsigned char +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_addcarryx_u32 (unsigned char __CF, unsigned int __X, + unsigned int __Y, unsigned int *__P) +{ + return __builtin_ia32_addcarryx_u32 (__CF, __X, __Y, __P); +} + +#ifdef __x86_64__ +extern __inline unsigned char +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_addcarryx_u64 (unsigned char __CF, unsigned long __X, + unsigned long __Y, unsigned long long *__P) +{ + return __builtin_ia32_addcarryx_u64 (__CF, __X, __Y, __P); +} +#endif + +#endif /* _ADXINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/ammintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/ammintrin.h new file mode 100644 index 0000000..a89b204 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/ammintrin.h @@ -0,0 +1,93 @@ +/* Copyright (C) 2007-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* Implemented from the specification included in the AMD Programmers + Manual Update, version 2.x */ + +#ifndef _AMMINTRIN_H_INCLUDED +#define _AMMINTRIN_H_INCLUDED + +/* We need definitions from the SSE3, SSE2 and SSE header files*/ +#include + +#ifndef __SSE4A__ +#pragma GCC push_options +#pragma GCC target("sse4a") +#define __DISABLE_SSE4A__ +#endif /* __SSE4A__ */ + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_stream_sd (double * __P, __m128d __Y) +{ + __builtin_ia32_movntsd (__P, (__v2df) __Y); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_stream_ss (float * __P, __m128 __Y) +{ + __builtin_ia32_movntss (__P, (__v4sf) __Y); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_extract_si64 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_extrq ((__v2di) __X, (__v16qi) __Y); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_extracti_si64 (__m128i __X, unsigned const int __I, unsigned const int __L) +{ + return (__m128i) __builtin_ia32_extrqi ((__v2di) __X, __I, __L); +} +#else +#define _mm_extracti_si64(X, I, L) \ + ((__m128i) __builtin_ia32_extrqi ((__v2di)(__m128i)(X), \ + (unsigned int)(I), (unsigned int)(L))) +#endif + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_insert_si64 (__m128i __X,__m128i __Y) +{ + return (__m128i) __builtin_ia32_insertq ((__v2di)__X, (__v2di)__Y); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_inserti_si64(__m128i __X, __m128i __Y, unsigned const int __I, unsigned const int __L) +{ + return (__m128i) __builtin_ia32_insertqi ((__v2di)__X, (__v2di)__Y, __I, __L); +} +#else +#define _mm_inserti_si64(X, Y, I, L) \ + ((__m128i) __builtin_ia32_insertqi ((__v2di)(__m128i)(X), \ + (__v2di)(__m128i)(Y), \ + (unsigned int)(I), (unsigned int)(L))) +#endif + +#ifdef __DISABLE_SSE4A__ +#undef __DISABLE_SSE4A__ +#pragma GCC pop_options +#endif /* __DISABLE_SSE4A__ */ + +#endif /* _AMMINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/arm_neon.h b/lib/gcc/x86_64-linux-android/4.9.x/include/arm_neon.h new file mode 100644 index 0000000..8692151 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/arm_neon.h @@ -0,0 +1,16620 @@ +//created by Victoria Zhislina, the Senior Application Engineer, Intel Corporation, victoria.zhislina@intel.com + +//*** Copyright (C) 2012-2014 Intel Corporation. All rights reserved. + +//IMPORTANT: READ BEFORE DOWNLOADING, COPYING, INSTALLING OR USING. + +//By downloading, copying, installing or using the software you agree to this license. +//If you do not agree to this license, do not download, install, copy or use the software. + +// License Agreement + +//Permission to use, copy, modify, and/or distribute this software for any +//purpose with or without fee is hereby granted, provided that the above +//copyright notice and this permission notice appear in all copies. + +//THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH +//REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY +//AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, +//INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM +//LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR +//OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR +//PERFORMANCE OF THIS SOFTWARE. + +//***************************************************************************************** +// This file is intended to simplify ARM->IA32 porting +// It makes the correspondence between ARM NEON intrinsics (as defined in "arm_neon.h") +// and x86 SSE(up to SSE4.2) intrinsic functions as defined in headers files below +// MMX instruction set is not used due to performance overhead and the necessity to use the +// EMMS instruction (_mm_empty())for mmx-x87 floating point switching +//***************************************************************************************** + +//!!!!!!! To use this file in your project that uses ARM NEON intinsics just keep arm_neon.h included and complile it as usual. +//!!!!!!! Please pay attention at USE_SSE4 below - you need to define it for newest Intel platforms for +//!!!!!!! greater performance. It can be done by -msse4.2 compiler switch. + +#ifndef NEON2SSE_H +#define NEON2SSE_H + +#ifndef USE_SSE4 +#if defined(__SSE4_2__) + #define USE_SSE4 +#endif +#endif + +#include //SSE +#include //SSE2 +#include //SSE3 +#include //SSSE3 +#ifdef USE_SSE4 +#include //SSE4.1 +#include //SSE4.2 +#endif + + +//*************** functions and data attributes, compiler dependent ********************************* +//*********************************************************************************** +#ifdef __GNUC__ +#define _GCC_VERSION (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__) +#define _NEON2SSE_ALIGN_16 __attribute__((aligned(16))) +#define _NEON2SSE_INLINE extern inline __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +#if _GCC_VERSION < 40500 + #define _NEON2SSE_PERFORMANCE_WARNING(function, explanation) __attribute__((deprecated)) function +#else + #define _NEON2SSE_PERFORMANCE_WARNING(function, explanation) __attribute__((deprecated(explanation))) function +#endif +#if defined(__x86_64__) + #define _NEON2SSE_64BIT __x86_64__ +#endif +#else +#define _NEON2SSE_ALIGN_16 __declspec(align(16)) +#define _NEON2SSE_INLINE __inline +#if defined(_MSC_VER)|| defined (__INTEL_COMPILER) + #define _NEON2SSE_PERFORMANCE_WARNING(function, EXPLANATION) __declspec(deprecated(EXPLANATION)) function +#if defined(_M_X64) + #define _NEON2SSE_64BIT _M_X64 +#endif +#else + #define _NEON2SSE_PERFORMANCE_WARNING(function, explanation) function +#endif +#endif + +#if defined (_NEON2SSE_64BIT) && defined (USE_SSE4) + #define _NEON2SSE_64BIT_SSE4 +#endif + +/*********************************************************************************************************************/ +// data types conversion +/*********************************************************************************************************************/ +#if defined(_MSC_VER) && (_MSC_VER < 1300) + typedef signed char int8_t; + typedef unsigned char uint8_t; + typedef signed short int16_t; + typedef unsigned short uint16_t; + typedef signed int int32_t; + typedef unsigned int uint32_t; + typedef signed long long int64_t; + typedef unsigned long long uint64_t; +#elif defined(_MSC_VER) + typedef signed __int8 int8_t; + typedef unsigned __int8 uint8_t; + typedef signed __int16 int16_t; + typedef unsigned __int16 uint16_t; + typedef signed __int32 int32_t; + typedef unsigned __int32 uint32_t; + + typedef signed long long int64_t; + typedef unsigned long long uint64_t; +#else +#include +#include +#endif + +typedef union __m64_128 { + uint64_t m64_u64[1]; + float m64_f32[2]; + int8_t m64_i8[8]; + int16_t m64_i16[4]; + int32_t m64_i32[2]; + int64_t m64_i64[1]; + uint8_t m64_u8[8]; + uint16_t m64_u16[4]; + uint32_t m64_u32[2]; +} __m64_128; + +typedef __m64_128 int8x8_t; +typedef __m64_128 uint8x8_t; +typedef __m64_128 int16x4_t; +typedef __m64_128 uint16x4_t; +typedef __m64_128 int32x2_t; +typedef __m64_128 uint32x2_t; +typedef __m64_128 int64x1_t; +typedef __m64_128 uint64x1_t; +typedef __m64_128 poly8x8_t; +typedef __m64_128 poly16x4_t; + +typedef __m64_128 float32x2_t; +typedef __m128 float32x4_t; + +typedef __m128 float16x4_t; //not supported by IA, for compatibility +typedef __m128 float16x8_t; //not supported by IA, for compatibility + +typedef __m128i int8x16_t; +typedef __m128i int16x8_t; +typedef __m128i int32x4_t; +typedef __m128i int64x2_t; +typedef __m128i uint8x16_t; +typedef __m128i uint16x8_t; +typedef __m128i uint32x4_t; +typedef __m128i uint64x2_t; +typedef __m128i poly8x16_t; +typedef __m128i poly16x8_t; + +#if defined(_MSC_VER) + #define SINT_MIN (-2147483647 - 1) /* min signed int value */ + #define SINT_MAX 2147483647 /* max signed int value */ +#else + #define SINT_MIN INT_MIN /* min signed int value */ + #define SINT_MAX INT_MAX /* max signed int value */ +#endif + +typedef float float32_t; +typedef float __fp16; + +typedef uint8_t poly8_t; +typedef uint16_t poly16_t; + + +//MSVC compilers (tested up to 2012 VS version) doesn't allow using structures or arrays of __m128x type as functions arguments resulting in +//error C2719: 'src': formal parameter with __declspec(align('16')) won't be aligned. To avoid it we need the special trick for functions that use these types +struct int8x16x2_t { + int8x16_t val[2]; +}; +struct int16x8x2_t { + int16x8_t val[2]; +}; +struct int32x4x2_t { + int32x4_t val[2]; +}; +struct int64x2x2_t { + int64x2_t val[2]; +}; +//Unfortunately we are unable to merge two 64-bits in on 128 bit register because user should be able to access val[n] members explicitly!!! +struct int8x8x2_t { + int8x8_t val[2]; +}; +struct int16x4x2_t { + int16x4_t val[2]; +}; +struct int32x2x2_t { + int32x2_t val[2]; +}; +struct int64x1x2_t { + int64x1_t val[2]; +}; + +typedef struct int8x16x2_t int8x16x2_t; //for C compilers to make them happy +typedef struct int16x8x2_t int16x8x2_t; //for C compilers to make them happy +typedef struct int32x4x2_t int32x4x2_t; //for C compilers to make them happy +typedef struct int64x2x2_t int64x2x2_t; //for C compilers to make them happy + +typedef struct int8x8x2_t int8x8x2_t; //for C compilers to make them happy +typedef struct int16x4x2_t int16x4x2_t; //for C compilers to make them happy +typedef struct int32x2x2_t int32x2x2_t; //for C compilers to make them happy +typedef struct int64x1x2_t int64x1x2_t; //for C compilers to make them happy + +/* to avoid pointer conversions the following unsigned integers structures are defined via the corresponding signed integers structures above */ +typedef struct int8x16x2_t uint8x16x2_t; +typedef struct int16x8x2_t uint16x8x2_t; +typedef struct int32x4x2_t uint32x4x2_t; +typedef struct int64x2x2_t uint64x2x2_t; +typedef struct int8x16x2_t poly8x16x2_t; +typedef struct int16x8x2_t poly16x8x2_t; + +typedef struct int8x8x2_t uint8x8x2_t; +typedef struct int16x4x2_t uint16x4x2_t; +typedef struct int32x2x2_t uint32x2x2_t; +typedef struct int64x1x2_t uint64x1x2_t; +typedef struct int8x8x2_t poly8x8x2_t; +typedef struct int16x4x2_t poly16x4x2_t; + +//float +struct float32x4x2_t { + float32x4_t val[2]; +}; +struct float16x8x2_t { + float16x8_t val[2]; +}; +struct float32x2x2_t { + float32x2_t val[2]; +}; + +typedef struct float32x4x2_t float32x4x2_t; //for C compilers to make them happy +typedef struct float16x8x2_t float16x8x2_t; //for C compilers to make them happy +typedef struct float32x2x2_t float32x2x2_t; //for C compilers to make them happy +typedef float16x8x2_t float16x4x2_t; + +//4 +struct int8x16x4_t { + int8x16_t val[4]; +}; +struct int16x8x4_t { + int16x8_t val[4]; +}; +struct int32x4x4_t { + int32x4_t val[4]; +}; +struct int64x2x4_t { + int64x2_t val[4]; +}; + +struct int8x8x4_t { + int8x8_t val[4]; +}; +struct int16x4x4_t { + int16x4_t val[4]; +}; +struct int32x2x4_t { + int32x2_t val[4]; +}; +struct int64x1x4_t { + int64x1_t val[4]; +}; + +typedef struct int8x16x4_t int8x16x4_t; //for C compilers to make them happy +typedef struct int16x8x4_t int16x8x4_t; //for C compilers to make them happy +typedef struct int32x4x4_t int32x4x4_t; //for C compilers to make them happy +typedef struct int64x2x4_t int64x2x4_t; //for C compilers to make them happy + +typedef struct int8x8x4_t int8x8x4_t; //for C compilers to make them happy +typedef struct int16x4x4_t int16x4x4_t; //for C compilers to make them happy +typedef struct int32x2x4_t int32x2x4_t; //for C compilers to make them happy +typedef struct int64x1x4_t int64x1x4_t; //for C compilers to make them happy + +/* to avoid pointer conversions the following unsigned integers structures are defined via the corresponding signed integers dealing structures above:*/ +typedef struct int8x8x4_t uint8x8x4_t; +typedef struct int16x4x4_t uint16x4x4_t; +typedef struct int32x2x4_t uint32x2x4_t; +typedef struct int64x1x4_t uint64x1x4_t; +typedef struct int8x8x4_t poly8x8x4_t; +typedef struct int16x4x4_t poly16x4x4_t; + +typedef struct int8x16x4_t uint8x16x4_t; +typedef struct int16x8x4_t uint16x8x4_t; +typedef struct int32x4x4_t uint32x4x4_t; +typedef struct int64x2x4_t uint64x2x4_t; +typedef struct int8x16x4_t poly8x16x4_t; +typedef struct int16x8x4_t poly16x8x4_t; + +struct float32x4x4_t { + float32x4_t val[4]; +}; +struct float16x8x4_t { + float16x8_t val[4]; +}; +struct float32x2x4_t { + float32x2_t val[4]; +}; + +typedef struct float32x4x4_t float32x4x4_t; //for C compilers to make them happy +typedef struct float16x8x4_t float16x8x4_t; //for C compilers to make them happy +typedef struct float32x2x4_t float32x2x4_t; //for C compilers to make them happy +typedef float16x8x4_t float16x4x4_t; + +//3 +struct int16x8x3_t { + int16x8_t val[3]; +}; +struct int32x4x3_t { + int32x4_t val[3]; +}; +struct int64x2x3_t { + int64x2_t val[3]; +}; +struct int8x16x3_t { + int8x16_t val[3]; +}; + +struct int16x4x3_t { + int16x4_t val[3]; +}; +struct int32x2x3_t { + int32x2_t val[3]; +}; +struct int64x1x3_t { + int64x1_t val[3]; +}; +struct int8x8x3_t { + int8x8_t val[3]; +}; +typedef struct int16x8x3_t int16x8x3_t; //for C compilers to make them happy +typedef struct int32x4x3_t int32x4x3_t; //for C compilers to make them happy +typedef struct int64x2x3_t int64x2x3_t; //for C compilers to make them happy +typedef struct int8x16x3_t int8x16x3_t; //for C compilers to make them happy + +typedef struct int8x8x3_t int8x8x3_t; //for C compilers to make them happy +typedef struct int16x4x3_t int16x4x3_t; //for C compilers to make them happy +typedef struct int32x2x3_t int32x2x3_t; //for C compilers to make them happy +typedef struct int64x1x3_t int64x1x3_t; //for C compilers to make them happy + + +/* to avoid pointer conversions the following unsigned integers structures are defined via the corresponding signed integers dealing structures above:*/ +typedef struct int8x16x3_t uint8x16x3_t; +typedef struct int16x8x3_t uint16x8x3_t; +typedef struct int32x4x3_t uint32x4x3_t; +typedef struct int64x2x3_t uint64x2x3_t; +typedef struct int8x16x3_t poly8x16x3_t; +typedef struct int16x8x3_t poly16x8x3_t; +typedef struct int8x8x3_t uint8x8x3_t; +typedef struct int16x4x3_t uint16x4x3_t; +typedef struct int32x2x3_t uint32x2x3_t; +typedef struct int64x1x3_t uint64x1x3_t; +typedef struct int8x8x3_t poly8x8x3_t; +typedef struct int16x4x3_t poly16x4x3_t; + +//float +struct float32x4x3_t { + float32x4_t val[3]; +}; +struct float32x2x3_t { + float32x2_t val[3]; +}; +struct float16x8x3_t { + float16x8_t val[3]; +}; + +typedef struct float32x4x3_t float32x4x3_t; //for C compilers to make them happy +typedef struct float16x8x3_t float16x8x3_t; //for C compilers to make them happy +typedef struct float32x2x3_t float32x2x3_t; //for C compilers to make them happy +typedef float16x8x3_t float16x4x3_t; + + +//**************************************************************************** +//****** Porting auxiliary macros ******************************************** + +//** floating point related macros ** +#define _M128i(a) _mm_castps_si128(a) +#define _M128(a) _mm_castsi128_ps(a) +//here the most performance effective implementation is compiler and 32/64 bits build dependent +#if defined (_NEON2SSE_64BIT) || (defined (__INTEL_COMPILER) && (__INTEL_COMPILER >= 1500) ) + + #define _pM128i(a) _mm_cvtsi64_si128(*(int64_t*)(&(a))) + #define _M64(out, inp) out.m64_i64[0] = _mm_cvtsi128_si64 (inp); + #define _M64f(out, inp) out.m64_i64[0] = _mm_cvtsi128_si64 (_M128i(inp)); +#else + //for 32bit gcc and Microsoft compilers builds + #define _pM128i(a) _mm_loadl_epi64((__m128i*)&(a)) + #define _M64(out, inp) _mm_storel_epi64 ((__m128i*)&(out), inp) + #define _M64f(out, inp) _mm_storel_epi64 ((__m128i*)&(out), _M128i(inp)) +#endif +#define _pM128(a) _mm_castsi128_ps(_pM128i(a)) + +#define return64(a) _M64(res64,a); return res64; +#define return64f(a) _M64f(res64,a); return res64; + +#define _Ui64(a) (*(uint64_t*)&(a)) +#define _UNSIGNED_T(a) u ## a + +#define _SIGNBIT64 ((uint64_t)1 << 63) +#define _SWAP_HI_LOW32 (2 | (3 << 2) | (0 << 4) | (1 << 6)) +#define _INSERTPS_NDX(srcField, dstField) (((srcField) << 6) | ((dstField) << 4) ) + +#define _NEON2SSE_REASON_SLOW_SERIAL "The function may be very slow due to the serial implementation, please try to avoid it" +#define _NEON2SSE_REASON_SLOW_UNEFFECTIVE "The function may be slow due to inefficient x86 SIMD implementation, please try to avoid it" + +//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +#define __constrange(min,max) const +#define __transfersize(size) +//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + +//************************************************************************* +//************************************************************************* +//********* Functions declarations as declared in original arm_neon.h ***** +//************************************************************************* +//Vector add: vadd -> Vr[i]:=Va[i]+Vb[i], Vr, Va, Vb have equal lane sizes. +int8x8_t vadd_s8(int8x8_t a, int8x8_t b); // VADD.I8 d0,d0,d0 +int16x4_t vadd_s16(int16x4_t a, int16x4_t b); // VADD.I16 d0,d0,d0 +int32x2_t vadd_s32(int32x2_t a, int32x2_t b); // VADD.I32 d0,d0,d0 +int64x1_t vadd_s64(int64x1_t a, int64x1_t b); // VADD.I64 d0,d0,d0 +float32x2_t vadd_f32(float32x2_t a, float32x2_t b); // VADD.F32 d0,d0,d0 +uint8x8_t vadd_u8(uint8x8_t a, uint8x8_t b); // VADD.I8 d0,d0,d0 +uint16x4_t vadd_u16(uint16x4_t a, uint16x4_t b); // VADD.I16 d0,d0,d0 +uint32x2_t vadd_u32(uint32x2_t a, uint32x2_t b); // VADD.I32 d0,d0,d0 +uint64x1_t vadd_u64(uint64x1_t a, uint64x1_t b); // VADD.I64 d0,d0,d0 +int8x16_t vaddq_s8(int8x16_t a, int8x16_t b); // VADD.I8 q0,q0,q0 +int16x8_t vaddq_s16(int16x8_t a, int16x8_t b); // VADD.I16 q0,q0,q0 +int32x4_t vaddq_s32(int32x4_t a, int32x4_t b); // VADD.I32 q0,q0,q0 +int64x2_t vaddq_s64(int64x2_t a, int64x2_t b); // VADD.I64 q0,q0,q0 +float32x4_t vaddq_f32(float32x4_t a, float32x4_t b); // VADD.F32 q0,q0,q0 +uint8x16_t vaddq_u8(uint8x16_t a, uint8x16_t b); // VADD.I8 q0,q0,q0 +uint16x8_t vaddq_u16(uint16x8_t a, uint16x8_t b); // VADD.I16 q0,q0,q0 +uint32x4_t vaddq_u32(uint32x4_t a, uint32x4_t b); // VADD.I32 q0,q0,q0 +uint64x2_t vaddq_u64(uint64x2_t a, uint64x2_t b); // VADD.I64 q0,q0,q0 +//Vector long add: vaddl -> Vr[i]:=Va[i]+Vb[i], Va, Vb have equal lane sizes, result is a 128 bit vector of lanes that are twice the width. +int16x8_t vaddl_s8(int8x8_t a, int8x8_t b); // VADDL.S8 q0,d0,d0 +int32x4_t vaddl_s16(int16x4_t a, int16x4_t b); // VADDL.S16 q0,d0,d0 +int64x2_t vaddl_s32(int32x2_t a, int32x2_t b); // VADDL.S32 q0,d0,d0 +uint16x8_t vaddl_u8(uint8x8_t a, uint8x8_t b); // VADDL.U8 q0,d0,d0 +uint32x4_t vaddl_u16(uint16x4_t a, uint16x4_t b); // VADDL.U16 q0,d0,d0 +uint64x2_t vaddl_u32(uint32x2_t a, uint32x2_t b); // VADDL.U32 q0,d0,d0 +//Vector wide addw: vadd -> Vr[i]:=Va[i]+Vb[i] +int16x8_t vaddw_s8(int16x8_t a, int8x8_t b); // VADDW.S8 q0,q0,d0 +int32x4_t vaddw_s16(int32x4_t a, int16x4_t b); // VADDW.S16 q0,q0,d0 +int64x2_t vaddw_s32(int64x2_t a, int32x2_t b); // VADDW.S32 q0,q0,d0 +uint16x8_t vaddw_u8(uint16x8_t a, uint8x8_t b); // VADDW.U8 q0,q0,d0 +uint32x4_t vaddw_u16(uint32x4_t a, uint16x4_t b); // VADDW.U16 q0,q0,d0 +uint64x2_t vaddw_u32(uint64x2_t a, uint32x2_t b); // VADDW.U32 q0,q0,d0 +//Vector halving add: vhadd -> Vr[i]:=(Va[i]+Vb[i])>>1 +int8x8_t vhadd_s8(int8x8_t a, int8x8_t b); // VHADD.S8 d0,d0,d0 +int16x4_t vhadd_s16(int16x4_t a, int16x4_t b); // VHADD.S16 d0,d0,d0 +int32x2_t vhadd_s32(int32x2_t a, int32x2_t b); // VHADD.S32 d0,d0,d0 +uint8x8_t vhadd_u8(uint8x8_t a, uint8x8_t b); // VHADD.U8 d0,d0,d0 +uint16x4_t vhadd_u16(uint16x4_t a, uint16x4_t b); // VHADD.U16 d0,d0,d0 +uint32x2_t vhadd_u32(uint32x2_t a, uint32x2_t b); // VHADD.U32 d0,d0,d0 +int8x16_t vhaddq_s8(int8x16_t a, int8x16_t b); // VHADD.S8 q0,q0,q0 +int16x8_t vhaddq_s16(int16x8_t a, int16x8_t b); // VHADD.S16 q0,q0,q0 +int32x4_t vhaddq_s32(int32x4_t a, int32x4_t b); // VHADD.S32 q0,q0,q0 +uint8x16_t vhaddq_u8(uint8x16_t a, uint8x16_t b); // VHADD.U8 q0,q0,q0 +uint16x8_t vhaddq_u16(uint16x8_t a, uint16x8_t b); // VHADD.U16 q0,q0,q0 +uint32x4_t vhaddq_u32(uint32x4_t a, uint32x4_t b); // VHADD.U32 q0,q0,q0 +//Vector rounding halving add: vrhadd -> Vr[i]:=(Va[i]+Vb[i]+1)>>1 +int8x8_t vrhadd_s8(int8x8_t a, int8x8_t b); // VRHADD.S8 d0,d0,d0 +int16x4_t vrhadd_s16(int16x4_t a, int16x4_t b); // VRHADD.S16 d0,d0,d0 +int32x2_t vrhadd_s32(int32x2_t a, int32x2_t b); // VRHADD.S32 d0,d0,d0 +uint8x8_t vrhadd_u8(uint8x8_t a, uint8x8_t b); // VRHADD.U8 d0,d0,d0 +uint16x4_t vrhadd_u16(uint16x4_t a, uint16x4_t b); // VRHADD.U16 d0,d0,d0 +uint32x2_t vrhadd_u32(uint32x2_t a, uint32x2_t b); // VRHADD.U32 d0,d0,d0 +int8x16_t vrhaddq_s8(int8x16_t a, int8x16_t b); // VRHADD.S8 q0,q0,q0 +int16x8_t vrhaddq_s16(int16x8_t a, int16x8_t b); // VRHADD.S16 q0,q0,q0 +int32x4_t vrhaddq_s32(int32x4_t a, int32x4_t b); // VRHADD.S32 q0,q0,q0 +uint8x16_t vrhaddq_u8(uint8x16_t a, uint8x16_t b); // VRHADD.U8 q0,q0,q0 +uint16x8_t vrhaddq_u16(uint16x8_t a, uint16x8_t b); // VRHADD.U16 q0,q0,q0 +uint32x4_t vrhaddq_u32(uint32x4_t a, uint32x4_t b); // VRHADD.U32 q0,q0,q0 +//Vector saturating add: vqadd -> Vr[i]:=sat(Va[i]+Vb[i]) +int8x8_t vqadd_s8(int8x8_t a, int8x8_t b); // VQADD.S8 d0,d0,d0 +int16x4_t vqadd_s16(int16x4_t a, int16x4_t b); // VQADD.S16 d0,d0,d0 +int32x2_t vqadd_s32(int32x2_t a, int32x2_t b); // VQADD.S32 d0,d0,d0 +int64x1_t vqadd_s64(int64x1_t a, int64x1_t b); // VQADD.S64 d0,d0,d0 +uint8x8_t vqadd_u8(uint8x8_t a, uint8x8_t b); // VQADD.U8 d0,d0,d0 +uint16x4_t vqadd_u16(uint16x4_t a, uint16x4_t b); // VQADD.U16 d0,d0,d0 +uint32x2_t vqadd_u32(uint32x2_t a, uint32x2_t b); // VQADD.U32 d0,d0,d0 +uint64x1_t vqadd_u64(uint64x1_t a, uint64x1_t b); // VQADD.U64 d0,d0,d0 +int8x16_t vqaddq_s8(int8x16_t a, int8x16_t b); // VQADD.S8 q0,q0,q0 +int16x8_t vqaddq_s16(int16x8_t a, int16x8_t b); // VQADD.S16 q0,q0,q0 +int32x4_t vqaddq_s32(int32x4_t a, int32x4_t b); // VQADD.S32 q0,q0,q0 +int64x2_t vqaddq_s64(int64x2_t a, int64x2_t b); // VQADD.S64 q0,q0,q0 +uint8x16_t vqaddq_u8(uint8x16_t a, uint8x16_t b); // VQADD.U8 q0,q0,q0 +uint16x8_t vqaddq_u16(uint16x8_t a, uint16x8_t b); // VQADD.U16 q0,q0,q0 +uint32x4_t vqaddq_u32(uint32x4_t a, uint32x4_t b); // VQADD.U32 q0,q0,q0 +uint64x2_t vqaddq_u64(uint64x2_t a, uint64x2_t b); // VQADD.U64 q0,q0,q0 +//Vector add high half: vaddhn-> Vr[i]:=Va[i]+Vb[i] +int8x8_t vaddhn_s16(int16x8_t a, int16x8_t b); // VADDHN.I16 d0,q0,q0 +int16x4_t vaddhn_s32(int32x4_t a, int32x4_t b); // VADDHN.I32 d0,q0,q0 +int32x2_t vaddhn_s64(int64x2_t a, int64x2_t b); // VADDHN.I64 d0,q0,q0 +uint8x8_t vaddhn_u16(uint16x8_t a, uint16x8_t b); // VADDHN.I16 d0,q0,q0 +uint16x4_t vaddhn_u32(uint32x4_t a, uint32x4_t b); // VADDHN.I32 d0,q0,q0 +uint32x2_t vaddhn_u64(uint64x2_t a, uint64x2_t b); // VADDHN.I64 d0,q0,q0 +//Vector rounding add high half: vraddhn +int8x8_t vraddhn_s16(int16x8_t a, int16x8_t b); // VRADDHN.I16 d0,q0,q0 +int16x4_t vraddhn_s32(int32x4_t a, int32x4_t b); // VRADDHN.I32 d0,q0,q0 +int32x2_t vraddhn_s64(int64x2_t a, int64x2_t b); // VRADDHN.I64 d0,q0,q0 +uint8x8_t vraddhn_u16(uint16x8_t a, uint16x8_t b); // VRADDHN.I16 d0,q0,q0 +uint16x4_t vraddhn_u32(uint32x4_t a, uint32x4_t b); // VRADDHN.I32 d0,q0,q0 +uint32x2_t vraddhn_u64(uint64x2_t a, uint64x2_t b); // VRADDHN.I64 d0,q0,q0 +//Multiplication +//Vector multiply: vmul -> Vr[i] := Va[i] * Vb[i] +int8x8_t vmul_s8(int8x8_t a, int8x8_t b); // VMUL.I8 d0,d0,d0 +int16x4_t vmul_s16(int16x4_t a, int16x4_t b); // VMUL.I16 d0,d0,d0 +int32x2_t vmul_s32(int32x2_t a, int32x2_t b); // VMUL.I32 d0,d0,d0 +float32x2_t vmul_f32(float32x2_t a, float32x2_t b); // VMUL.F32 d0,d0,d0 +uint8x8_t vmul_u8(uint8x8_t a, uint8x8_t b); // VMUL.I8 d0,d0,d0 +uint16x4_t vmul_u16(uint16x4_t a, uint16x4_t b); // VMUL.I16 d0,d0,d0 +uint32x2_t vmul_u32(uint32x2_t a, uint32x2_t b); // VMUL.I32 d0,d0,d0 +poly8x8_t vmul_p8(poly8x8_t a, poly8x8_t b); // VMUL.P8 d0,d0,d0 +int8x16_t vmulq_s8(int8x16_t a, int8x16_t b); // VMUL.I8 q0,q0,q0 +int16x8_t vmulq_s16(int16x8_t a, int16x8_t b); // VMUL.I16 q0,q0,q0 +int32x4_t vmulq_s32(int32x4_t a, int32x4_t b); // VMUL.I32 q0,q0,q0 +float32x4_t vmulq_f32(float32x4_t a, float32x4_t b); // VMUL.F32 q0,q0,q0 +uint8x16_t vmulq_u8(uint8x16_t a, uint8x16_t b); // VMUL.I8 q0,q0,q0 +uint16x8_t vmulq_u16(uint16x8_t a, uint16x8_t b); // VMUL.I16 q0,q0,q0 +uint32x4_t vmulq_u32(uint32x4_t a, uint32x4_t b); // VMUL.I32 q0,q0,q0 +poly8x16_t vmulq_p8(poly8x16_t a, poly8x16_t b); // VMUL.P8 q0,q0,q0 +//multiply lane +int16x4_t vmul_lane_s16 (int16x4_t a, int16x4_t b, __constrange(0,3) int c); +int32x2_t vmul_lane_s32 (int32x2_t a, int32x2_t b, __constrange(0,1) int c); +float32x2_t vmul_lane_f32 (float32x2_t a, float32x2_t b, __constrange(0,1) int c); +uint16x4_t vmul_lane_u16 (uint16x4_t a, uint16x4_t b, __constrange(0,3) int c); +uint32x2_t vmul_lane_u32 (uint32x2_t a, uint32x2_t b, __constrange(0,1) int c); +int16x8_t vmulq_lane_s16 (int16x8_t a, int16x4_t b, __constrange(0,3) int c); +int32x4_t vmulq_lane_s32 (int32x4_t a, int32x2_t b, __constrange(0,1) int c); +float32x4_t vmulq_lane_f32 (float32x4_t a, float32x2_t b, __constrange(0,1) int c); +uint16x8_t vmulq_lane_u16 (uint16x8_t a, uint16x4_t b, __constrange(0,3) int c); +uint32x4_t vmulq_lane_u32 (uint32x4_t a, uint32x2_t b, __constrange(0,1) int c); +//Vector multiply accumulate: vmla -> Vr[i] := Va[i] + Vb[i] * Vc[i] +int8x8_t vmla_s8(int8x8_t a, int8x8_t b, int8x8_t c); // VMLA.I8 d0,d0,d0 +int16x4_t vmla_s16(int16x4_t a, int16x4_t b, int16x4_t c); // VMLA.I16 d0,d0,d0 +int32x2_t vmla_s32(int32x2_t a, int32x2_t b, int32x2_t c); // VMLA.I32 d0,d0,d0 +float32x2_t vmla_f32(float32x2_t a, float32x2_t b, float32x2_t c); // VMLA.F32 d0,d0,d0 +uint8x8_t vmla_u8(uint8x8_t a, uint8x8_t b, uint8x8_t c); // VMLA.I8 d0,d0,d0 +uint16x4_t vmla_u16(uint16x4_t a, uint16x4_t b, uint16x4_t c); // VMLA.I16 d0,d0,d0 +uint32x2_t vmla_u32(uint32x2_t a, uint32x2_t b, uint32x2_t c); // VMLA.I32 d0,d0,d0 +int8x16_t vmlaq_s8(int8x16_t a, int8x16_t b, int8x16_t c); // VMLA.I8 q0,q0,q0 +int16x8_t vmlaq_s16(int16x8_t a, int16x8_t b, int16x8_t c); // VMLA.I16 q0,q0,q0 +int32x4_t vmlaq_s32(int32x4_t a, int32x4_t b, int32x4_t c); // VMLA.I32 q0,q0,q0 +float32x4_t vmlaq_f32(float32x4_t a, float32x4_t b, float32x4_t c); // VMLA.F32 q0,q0,q0 +uint8x16_t vmlaq_u8(uint8x16_t a, uint8x16_t b, uint8x16_t c); // VMLA.I8 q0,q0,q0 +uint16x8_t vmlaq_u16(uint16x8_t a, uint16x8_t b, uint16x8_t c); // VMLA.I16 q0,q0,q0 +uint32x4_t vmlaq_u32(uint32x4_t a, uint32x4_t b, uint32x4_t c); // VMLA.I32 q0,q0,q0 +//Vector multiply accumulate long: vmlal -> Vr[i] := Va[i] + Vb[i] * Vc[i] +int16x8_t vmlal_s8(int16x8_t a, int8x8_t b, int8x8_t c); // VMLAL.S8 q0,d0,d0 +int32x4_t vmlal_s16(int32x4_t a, int16x4_t b, int16x4_t c); // VMLAL.S16 q0,d0,d0 +int64x2_t vmlal_s32(int64x2_t a, int32x2_t b, int32x2_t c); // VMLAL.S32 q0,d0,d0 +uint16x8_t vmlal_u8(uint16x8_t a, uint8x8_t b, uint8x8_t c); // VMLAL.U8 q0,d0,d0 +uint32x4_t vmlal_u16(uint32x4_t a, uint16x4_t b, uint16x4_t c); // VMLAL.U16 q0,d0,d0 +uint64x2_t vmlal_u32(uint64x2_t a, uint32x2_t b, uint32x2_t c); // VMLAL.U32 q0,d0,d0 +//Vector multiply subtract: vmls -> Vr[i] := Va[i] - Vb[i] * Vc[i] +int8x8_t vmls_s8(int8x8_t a, int8x8_t b, int8x8_t c); // VMLS.I8 d0,d0,d0 +int16x4_t vmls_s16(int16x4_t a, int16x4_t b, int16x4_t c); // VMLS.I16 d0,d0,d0 +int32x2_t vmls_s32(int32x2_t a, int32x2_t b, int32x2_t c); // VMLS.I32 d0,d0,d0 +float32x2_t vmls_f32(float32x2_t a, float32x2_t b, float32x2_t c); // VMLS.F32 d0,d0,d0 +uint8x8_t vmls_u8(uint8x8_t a, uint8x8_t b, uint8x8_t c); // VMLS.I8 d0,d0,d0 +uint16x4_t vmls_u16(uint16x4_t a, uint16x4_t b, uint16x4_t c); // VMLS.I16 d0,d0,d0 +uint32x2_t vmls_u32(uint32x2_t a, uint32x2_t b, uint32x2_t c); // VMLS.I32 d0,d0,d0 +int8x16_t vmlsq_s8(int8x16_t a, int8x16_t b, int8x16_t c); // VMLS.I8 q0,q0,q0 +int16x8_t vmlsq_s16(int16x8_t a, int16x8_t b, int16x8_t c); // VMLS.I16 q0,q0,q0 +int32x4_t vmlsq_s32(int32x4_t a, int32x4_t b, int32x4_t c); // VMLS.I32 q0,q0,q0 +float32x4_t vmlsq_f32(float32x4_t a, float32x4_t b, float32x4_t c); // VMLS.F32 q0,q0,q0 +uint8x16_t vmlsq_u8(uint8x16_t a, uint8x16_t b, uint8x16_t c); // VMLS.I8 q0,q0,q0 +uint16x8_t vmlsq_u16(uint16x8_t a, uint16x8_t b, uint16x8_t c); // VMLS.I16 q0,q0,q0 +uint32x4_t vmlsq_u32(uint32x4_t a, uint32x4_t b, uint32x4_t c); // VMLS.I32 q0,q0,q0 +//Vector multiply subtract long +int16x8_t vmlsl_s8(int16x8_t a, int8x8_t b, int8x8_t c); // VMLSL.S8 q0,d0,d0 +int32x4_t vmlsl_s16(int32x4_t a, int16x4_t b, int16x4_t c); // VMLSL.S16 q0,d0,d0 +int64x2_t vmlsl_s32(int64x2_t a, int32x2_t b, int32x2_t c); // VMLSL.S32 q0,d0,d0 +uint16x8_t vmlsl_u8(uint16x8_t a, uint8x8_t b, uint8x8_t c); // VMLSL.U8 q0,d0,d0 +uint32x4_t vmlsl_u16(uint32x4_t a, uint16x4_t b, uint16x4_t c); // VMLSL.U16 q0,d0,d0 +uint64x2_t vmlsl_u32(uint64x2_t a, uint32x2_t b, uint32x2_t c); // VMLSL.U32 q0,d0,d0 +//Vector saturating doubling multiply high +int16x4_t vqdmulh_s16(int16x4_t a, int16x4_t b); // VQDMULH.S16 d0,d0,d0 +int32x2_t vqdmulh_s32(int32x2_t a, int32x2_t b); // VQDMULH.S32 d0,d0,d0 +int16x8_t vqdmulhq_s16(int16x8_t a, int16x8_t b); // VQDMULH.S16 q0,q0,q0 +int32x4_t vqdmulhq_s32(int32x4_t a, int32x4_t b); // VQDMULH.S32 q0,q0,q0 +//Vector saturating rounding doubling multiply high +int16x4_t vqrdmulh_s16(int16x4_t a, int16x4_t b); // VQRDMULH.S16 d0,d0,d0 +int32x2_t vqrdmulh_s32(int32x2_t a, int32x2_t b); // VQRDMULH.S32 d0,d0,d0 +int16x8_t vqrdmulhq_s16(int16x8_t a, int16x8_t b); // VQRDMULH.S16 q0,q0,q0 +int32x4_t vqrdmulhq_s32(int32x4_t a, int32x4_t b); // VQRDMULH.S32 q0,q0,q0 +//Vector saturating doubling multiply accumulate long +int32x4_t vqdmlal_s16(int32x4_t a, int16x4_t b, int16x4_t c); // VQDMLAL.S16 q0,d0,d0 +int64x2_t vqdmlal_s32(int64x2_t a, int32x2_t b, int32x2_t c); // VQDMLAL.S32 q0,d0,d0 +//Vector saturating doubling multiply subtract long +int32x4_t vqdmlsl_s16(int32x4_t a, int16x4_t b, int16x4_t c); // VQDMLSL.S16 q0,d0,d0 +int64x2_t vqdmlsl_s32(int64x2_t a, int32x2_t b, int32x2_t c); // VQDMLSL.S32 q0,d0,d0 +//Vector long multiply +int16x8_t vmull_s8(int8x8_t a, int8x8_t b); // VMULL.S8 q0,d0,d0 +int32x4_t vmull_s16(int16x4_t a, int16x4_t b); // VMULL.S16 q0,d0,d0 +int64x2_t vmull_s32(int32x2_t a, int32x2_t b); // VMULL.S32 q0,d0,d0 +uint16x8_t vmull_u8(uint8x8_t a, uint8x8_t b); // VMULL.U8 q0,d0,d0 +uint32x4_t vmull_u16(uint16x4_t a, uint16x4_t b); // VMULL.U16 q0,d0,d0 +uint64x2_t vmull_u32(uint32x2_t a, uint32x2_t b); // VMULL.U32 q0,d0,d0 +poly16x8_t vmull_p8(poly8x8_t a, poly8x8_t b); // VMULL.P8 q0,d0,d0 +//Vector saturating doubling long multiply +int32x4_t vqdmull_s16(int16x4_t a, int16x4_t b); // VQDMULL.S16 q0,d0,d0 +int64x2_t vqdmull_s32(int32x2_t a, int32x2_t b); // VQDMULL.S32 q0,d0,d0 +//Subtraction +//Vector subtract +int8x8_t vsub_s8(int8x8_t a, int8x8_t b); // VSUB.I8 d0,d0,d0 +int16x4_t vsub_s16(int16x4_t a, int16x4_t b); // VSUB.I16 d0,d0,d0 +int32x2_t vsub_s32(int32x2_t a, int32x2_t b); // VSUB.I32 d0,d0,d0 +int64x1_t vsub_s64(int64x1_t a, int64x1_t b); // VSUB.I64 d0,d0,d0 +float32x2_t vsub_f32(float32x2_t a, float32x2_t b); // VSUB.F32 d0,d0,d0 +uint8x8_t vsub_u8(uint8x8_t a, uint8x8_t b); // VSUB.I8 d0,d0,d0 +uint16x4_t vsub_u16(uint16x4_t a, uint16x4_t b); // VSUB.I16 d0,d0,d0 +uint32x2_t vsub_u32(uint32x2_t a, uint32x2_t b); // VSUB.I32 d0,d0,d0 +uint64x1_t vsub_u64(uint64x1_t a, uint64x1_t b); // VSUB.I64 d0,d0,d0 +int8x16_t vsubq_s8(int8x16_t a, int8x16_t b); // VSUB.I8 q0,q0,q0 +int16x8_t vsubq_s16(int16x8_t a, int16x8_t b); // VSUB.I16 q0,q0,q0 +int32x4_t vsubq_s32(int32x4_t a, int32x4_t b); // VSUB.I32 q0,q0,q0 +int64x2_t vsubq_s64(int64x2_t a, int64x2_t b); // VSUB.I64 q0,q0,q0 +float32x4_t vsubq_f32(float32x4_t a, float32x4_t b); // VSUB.F32 q0,q0,q0 +uint8x16_t vsubq_u8(uint8x16_t a, uint8x16_t b); // VSUB.I8 q0,q0,q0 +uint16x8_t vsubq_u16(uint16x8_t a, uint16x8_t b); // VSUB.I16 q0,q0,q0 +uint32x4_t vsubq_u32(uint32x4_t a, uint32x4_t b); // VSUB.I32 q0,q0,q0 +uint64x2_t vsubq_u64(uint64x2_t a, uint64x2_t b); // VSUB.I64 q0,q0,q0 +//Vector long subtract: vsub -> Vr[i]:=Va[i]+Vb[i] +int16x8_t vsubl_s8(int8x8_t a, int8x8_t b); // VSUBL.S8 q0,d0,d0 +int32x4_t vsubl_s16(int16x4_t a, int16x4_t b); // VSUBL.S16 q0,d0,d0 +int64x2_t vsubl_s32(int32x2_t a, int32x2_t b); // VSUBL.S32 q0,d0,d0 +uint16x8_t vsubl_u8(uint8x8_t a, uint8x8_t b); // VSUBL.U8 q0,d0,d0 +uint32x4_t vsubl_u16(uint16x4_t a, uint16x4_t b); // VSUBL.U16 q0,d0,d0 +uint64x2_t vsubl_u32(uint32x2_t a, uint32x2_t b); // VSUBL.U32 q0,d0,d0 +//Vector wide subtract: vsub -> Vr[i]:=Va[i]+Vb[i] +int16x8_t vsubw_s8(int16x8_t a, int8x8_t b); // VSUBW.S8 q0,q0,d0 +int32x4_t vsubw_s16(int32x4_t a, int16x4_t b); // VSUBW.S16 q0,q0,d0 +int64x2_t vsubw_s32(int64x2_t a, int32x2_t b); // VSUBW.S32 q0,q0,d0 +uint16x8_t vsubw_u8(uint16x8_t a, uint8x8_t b); // VSUBW.U8 q0,q0,d0 +uint32x4_t vsubw_u16(uint32x4_t a, uint16x4_t b); // VSUBW.U16 q0,q0,d0 +uint64x2_t vsubw_u32(uint64x2_t a, uint32x2_t b); // VSUBW.U32 q0,q0,d0 +//Vector saturating subtract +int8x8_t vqsub_s8(int8x8_t a, int8x8_t b); // VQSUB.S8 d0,d0,d0 +int16x4_t vqsub_s16(int16x4_t a, int16x4_t b); // VQSUB.S16 d0,d0,d0 +int32x2_t vqsub_s32(int32x2_t a, int32x2_t b); // VQSUB.S32 d0,d0,d0 +int64x1_t vqsub_s64(int64x1_t a, int64x1_t b); // VQSUB.S64 d0,d0,d0 +uint8x8_t vqsub_u8(uint8x8_t a, uint8x8_t b); // VQSUB.U8 d0,d0,d0 +uint16x4_t vqsub_u16(uint16x4_t a, uint16x4_t b); // VQSUB.U16 d0,d0,d0 +uint32x2_t vqsub_u32(uint32x2_t a, uint32x2_t b); // VQSUB.U32 d0,d0,d0 +uint64x1_t vqsub_u64(uint64x1_t a, uint64x1_t b); // VQSUB.U64 d0,d0,d0 +int8x16_t vqsubq_s8(int8x16_t a, int8x16_t b); // VQSUB.S8 q0,q0,q0 +int16x8_t vqsubq_s16(int16x8_t a, int16x8_t b); // VQSUB.S16 q0,q0,q0 +int32x4_t vqsubq_s32(int32x4_t a, int32x4_t b); // VQSUB.S32 q0,q0,q0 +int64x2_t vqsubq_s64(int64x2_t a, int64x2_t b); // VQSUB.S64 q0,q0,q0 +uint8x16_t vqsubq_u8(uint8x16_t a, uint8x16_t b); // VQSUB.U8 q0,q0,q0 +uint16x8_t vqsubq_u16(uint16x8_t a, uint16x8_t b); // VQSUB.U16 q0,q0,q0 +uint32x4_t vqsubq_u32(uint32x4_t a, uint32x4_t b); // VQSUB.U32 q0,q0,q0 +uint64x2_t vqsubq_u64(uint64x2_t a, uint64x2_t b); // VQSUB.U64 q0,q0,q0 +//Vector halving subtract +int8x8_t vhsub_s8(int8x8_t a, int8x8_t b); // VHSUB.S8 d0,d0,d0 +int16x4_t vhsub_s16(int16x4_t a, int16x4_t b); // VHSUB.S16 d0,d0,d0 +int32x2_t vhsub_s32(int32x2_t a, int32x2_t b); // VHSUB.S32 d0,d0,d0 +uint8x8_t vhsub_u8(uint8x8_t a, uint8x8_t b); // VHSUB.U8 d0,d0,d0 +uint16x4_t vhsub_u16(uint16x4_t a, uint16x4_t b); // VHSUB.U16 d0,d0,d0 +uint32x2_t vhsub_u32(uint32x2_t a, uint32x2_t b); // VHSUB.U32 d0,d0,d0 +int8x16_t vhsubq_s8(int8x16_t a, int8x16_t b); // VHSUB.S8 q0,q0,q0 +int16x8_t vhsubq_s16(int16x8_t a, int16x8_t b); // VHSUB.S16 q0,q0,q0 +int32x4_t vhsubq_s32(int32x4_t a, int32x4_t b); // VHSUB.S32 q0,q0,q0 +uint8x16_t vhsubq_u8(uint8x16_t a, uint8x16_t b); // VHSUB.U8 q0,q0,q0 +uint16x8_t vhsubq_u16(uint16x8_t a, uint16x8_t b); // VHSUB.U16 q0,q0,q0 +uint32x4_t vhsubq_u32(uint32x4_t a, uint32x4_t b); // VHSUB.U32 q0,q0,q0 +//Vector subtract high half +int8x8_t vsubhn_s16(int16x8_t a, int16x8_t b); // VSUBHN.I16 d0,q0,q0 +int16x4_t vsubhn_s32(int32x4_t a, int32x4_t b); // VSUBHN.I32 d0,q0,q0 +int32x2_t vsubhn_s64(int64x2_t a, int64x2_t b); // VSUBHN.I64 d0,q0,q0 +uint8x8_t vsubhn_u16(uint16x8_t a, uint16x8_t b); // VSUBHN.I16 d0,q0,q0 +uint16x4_t vsubhn_u32(uint32x4_t a, uint32x4_t b); // VSUBHN.I32 d0,q0,q0 +uint32x2_t vsubhn_u64(uint64x2_t a, uint64x2_t b); // VSUBHN.I64 d0,q0,q0 +//Vector rounding subtract high half +int8x8_t vrsubhn_s16(int16x8_t a, int16x8_t b); // VRSUBHN.I16 d0,q0,q0 +int16x4_t vrsubhn_s32(int32x4_t a, int32x4_t b); // VRSUBHN.I32 d0,q0,q0 +int32x2_t vrsubhn_s64(int64x2_t a, int64x2_t b); // VRSUBHN.I64 d0,q0,q0 +uint8x8_t vrsubhn_u16(uint16x8_t a, uint16x8_t b); // VRSUBHN.I16 d0,q0,q0 +uint16x4_t vrsubhn_u32(uint32x4_t a, uint32x4_t b); // VRSUBHN.I32 d0,q0,q0 +uint32x2_t vrsubhn_u64(uint64x2_t a, uint64x2_t b); // VRSUBHN.I64 d0,q0,q0 +//Comparison +//Vector compare equal +uint8x8_t vceq_s8(int8x8_t a, int8x8_t b); // VCEQ.I8 d0, d0, d0 +uint16x4_t vceq_s16(int16x4_t a, int16x4_t b); // VCEQ.I16 d0, d0, d0 +uint32x2_t vceq_s32(int32x2_t a, int32x2_t b); // VCEQ.I32 d0, d0, d0 +uint32x2_t vceq_f32(float32x2_t a, float32x2_t b); // VCEQ.F32 d0, d0, d0 +uint8x8_t vceq_u8(uint8x8_t a, uint8x8_t b); // VCEQ.I8 d0, d0, d0 +uint16x4_t vceq_u16(uint16x4_t a, uint16x4_t b); // VCEQ.I16 d0, d0, d0 +uint32x2_t vceq_u32(uint32x2_t a, uint32x2_t b); // VCEQ.I32 d0, d0, d0 +uint8x8_t vceq_p8(poly8x8_t a, poly8x8_t b); // VCEQ.I8 d0, d0, d0 +uint8x16_t vceqq_s8(int8x16_t a, int8x16_t b); // VCEQ.I8 q0, q0, q0 +uint16x8_t vceqq_s16(int16x8_t a, int16x8_t b); // VCEQ.I16 q0, q0, q0 +uint32x4_t vceqq_s32(int32x4_t a, int32x4_t b); // VCEQ.I32 q0, q0, q0 +uint32x4_t vceqq_f32(float32x4_t a, float32x4_t b); // VCEQ.F32 q0, q0, q0 +uint8x16_t vceqq_u8(uint8x16_t a, uint8x16_t b); // VCEQ.I8 q0, q0, q0 +uint16x8_t vceqq_u16(uint16x8_t a, uint16x8_t b); // VCEQ.I16 q0, q0, q0 +uint32x4_t vceqq_u32(uint32x4_t a, uint32x4_t b); // VCEQ.I32 q0, q0, q0 +uint8x16_t vceqq_p8(poly8x16_t a, poly8x16_t b); // VCEQ.I8 q0, q0, q0 +//Vector compare greater-than or equal +uint8x8_t vcge_s8(int8x8_t a, int8x8_t b); // VCGE.S8 d0, d0, d0 +uint16x4_t vcge_s16(int16x4_t a, int16x4_t b); // VCGE.S16 d0, d0, d0 +uint32x2_t vcge_s32(int32x2_t a, int32x2_t b); // VCGE.S32 d0, d0, d0 +uint32x2_t vcge_f32(float32x2_t a, float32x2_t b); // VCGE.F32 d0, d0, d0 +uint8x8_t vcge_u8(uint8x8_t a, uint8x8_t b); // VCGE.U8 d0, d0, d0 +uint16x4_t vcge_u16(uint16x4_t a, uint16x4_t b); // VCGE.U16 d0, d0, d0 +uint32x2_t vcge_u32(uint32x2_t a, uint32x2_t b); // VCGE.U32 d0, d0, d0 +uint8x16_t vcgeq_s8(int8x16_t a, int8x16_t b); // VCGE.S8 q0, q0, q0 +uint16x8_t vcgeq_s16(int16x8_t a, int16x8_t b); // VCGE.S16 q0, q0, q0 +uint32x4_t vcgeq_s32(int32x4_t a, int32x4_t b); // VCGE.S32 q0, q0, q0 +uint32x4_t vcgeq_f32(float32x4_t a, float32x4_t b); // VCGE.F32 q0, q0, q0 +uint8x16_t vcgeq_u8(uint8x16_t a, uint8x16_t b); // VCGE.U8 q0, q0, q0 +uint16x8_t vcgeq_u16(uint16x8_t a, uint16x8_t b); // VCGE.U16 q0, q0, q0 +uint32x4_t vcgeq_u32(uint32x4_t a, uint32x4_t b); // VCGE.U32 q0, q0, q0 +//Vector compare less-than or equal +uint8x8_t vcle_s8(int8x8_t a, int8x8_t b); // VCGE.S8 d0, d0, d0 +uint16x4_t vcle_s16(int16x4_t a, int16x4_t b); // VCGE.S16 d0, d0, d0 +uint32x2_t vcle_s32(int32x2_t a, int32x2_t b); // VCGE.S32 d0, d0, d0 +uint32x2_t vcle_f32(float32x2_t a, float32x2_t b); // VCGE.F32 d0, d0, d0 +uint8x8_t vcle_u8(uint8x8_t a, uint8x8_t b); // VCGE.U8 d0, d0, d0 +uint16x4_t vcle_u16(uint16x4_t a, uint16x4_t b); // VCGE.U16 d0, d0, d0 +uint32x2_t vcle_u32(uint32x2_t a, uint32x2_t b); // VCGE.U32 d0, d0, d0 +uint8x16_t vcleq_s8(int8x16_t a, int8x16_t b); // VCGE.S8 q0, q0, q0 +uint16x8_t vcleq_s16(int16x8_t a, int16x8_t b); // VCGE.S16 q0, q0, q0 +uint32x4_t vcleq_s32(int32x4_t a, int32x4_t b); // VCGE.S32 q0, q0, q0 +uint32x4_t vcleq_f32(float32x4_t a, float32x4_t b); // VCGE.F32 q0, q0, q0 +uint8x16_t vcleq_u8(uint8x16_t a, uint8x16_t b); // VCGE.U8 q0, q0, q0 +uint16x8_t vcleq_u16(uint16x8_t a, uint16x8_t b); // VCGE.U16 q0, q0, q0 +uint32x4_t vcleq_u32(uint32x4_t a, uint32x4_t b); // VCGE.U32 q0, q0, q0 +//Vector compare greater-than +uint8x8_t vcgt_s8(int8x8_t a, int8x8_t b); // VCGT.S8 d0, d0, d0 +uint16x4_t vcgt_s16(int16x4_t a, int16x4_t b); // VCGT.S16 d0, d0, d0 +uint32x2_t vcgt_s32(int32x2_t a, int32x2_t b); // VCGT.S32 d0, d0, d0 +uint32x2_t vcgt_f32(float32x2_t a, float32x2_t b); // VCGT.F32 d0, d0, d0 +uint8x8_t vcgt_u8(uint8x8_t a, uint8x8_t b); // VCGT.U8 d0, d0, d0 +uint16x4_t vcgt_u16(uint16x4_t a, uint16x4_t b); // VCGT.U16 d0, d0, d0 +uint32x2_t vcgt_u32(uint32x2_t a, uint32x2_t b); // VCGT.U32 d0, d0, d0 +uint8x16_t vcgtq_s8(int8x16_t a, int8x16_t b); // VCGT.S8 q0, q0, q0 +uint16x8_t vcgtq_s16(int16x8_t a, int16x8_t b); // VCGT.S16 q0, q0, q0 +uint32x4_t vcgtq_s32(int32x4_t a, int32x4_t b); // VCGT.S32 q0, q0, q0 +uint32x4_t vcgtq_f32(float32x4_t a, float32x4_t b); // VCGT.F32 q0, q0, q0 +uint8x16_t vcgtq_u8(uint8x16_t a, uint8x16_t b); // VCGT.U8 q0, q0, q0 +uint16x8_t vcgtq_u16(uint16x8_t a, uint16x8_t b); // VCGT.U16 q0, q0, q0 +uint32x4_t vcgtq_u32(uint32x4_t a, uint32x4_t b); // VCGT.U32 q0, q0, q0 +//Vector compare less-than +uint8x8_t vclt_s8(int8x8_t a, int8x8_t b); // VCGT.S8 d0, d0, d0 +uint16x4_t vclt_s16(int16x4_t a, int16x4_t b); // VCGT.S16 d0, d0, d0 +uint32x2_t vclt_s32(int32x2_t a, int32x2_t b); // VCGT.S32 d0, d0, d0 +uint32x2_t vclt_f32(float32x2_t a, float32x2_t b); // VCGT.F32 d0, d0, d0 +uint8x8_t vclt_u8(uint8x8_t a, uint8x8_t b); // VCGT.U8 d0, d0, d0 +uint16x4_t vclt_u16(uint16x4_t a, uint16x4_t b); // VCGT.U16 d0, d0, d0 +uint32x2_t vclt_u32(uint32x2_t a, uint32x2_t b); // VCGT.U32 d0, d0, d0 +uint8x16_t vcltq_s8(int8x16_t a, int8x16_t b); // VCGT.S8 q0, q0, q0 +uint16x8_t vcltq_s16(int16x8_t a, int16x8_t b); // VCGT.S16 q0, q0, q0 +uint32x4_t vcltq_s32(int32x4_t a, int32x4_t b); // VCGT.S32 q0, q0, q0 +uint32x4_t vcltq_f32(float32x4_t a, float32x4_t b); // VCGT.F32 q0, q0, q0 +uint8x16_t vcltq_u8(uint8x16_t a, uint8x16_t b); // VCGT.U8 q0, q0, q0 +uint16x8_t vcltq_u16(uint16x8_t a, uint16x8_t b); // VCGT.U16 q0, q0, q0 +uint32x4_t vcltq_u32(uint32x4_t a, uint32x4_t b); // VCGT.U32 q0, q0, q0 +//Vector compare absolute greater-than or equal +uint32x2_t vcage_f32(float32x2_t a, float32x2_t b); // VACGE.F32 d0, d0, d0 +uint32x4_t vcageq_f32(float32x4_t a, float32x4_t b); // VACGE.F32 q0, q0, q0 +//Vector compare absolute less-than or equal +uint32x2_t vcale_f32(float32x2_t a, float32x2_t b); // VACGE.F32 d0, d0, d0 +uint32x4_t vcaleq_f32(float32x4_t a, float32x4_t b); // VACGE.F32 q0, q0, q0 +//Vector compare absolute greater-than +uint32x2_t vcagt_f32(float32x2_t a, float32x2_t b); // VACGT.F32 d0, d0, d0 +uint32x4_t vcagtq_f32(float32x4_t a, float32x4_t b); // VACGT.F32 q0, q0, q0 +//Vector compare absolute less-than +uint32x2_t vcalt_f32(float32x2_t a, float32x2_t b); // VACGT.F32 d0, d0, d0 +uint32x4_t vcaltq_f32(float32x4_t a, float32x4_t b); // VACGT.F32 q0, q0, q0 +//Vector test bits +uint8x8_t vtst_s8(int8x8_t a, int8x8_t b); // VTST.8 d0, d0, d0 +uint16x4_t vtst_s16(int16x4_t a, int16x4_t b); // VTST.16 d0, d0, d0 +uint32x2_t vtst_s32(int32x2_t a, int32x2_t b); // VTST.32 d0, d0, d0 +uint8x8_t vtst_u8(uint8x8_t a, uint8x8_t b); // VTST.8 d0, d0, d0 +uint16x4_t vtst_u16(uint16x4_t a, uint16x4_t b); // VTST.16 d0, d0, d0 +uint32x2_t vtst_u32(uint32x2_t a, uint32x2_t b); // VTST.32 d0, d0, d0 +uint8x8_t vtst_p8(poly8x8_t a, poly8x8_t b); // VTST.8 d0, d0, d0 +uint8x16_t vtstq_s8(int8x16_t a, int8x16_t b); // VTST.8 q0, q0, q0 +uint16x8_t vtstq_s16(int16x8_t a, int16x8_t b); // VTST.16 q0, q0, q0 +uint32x4_t vtstq_s32(int32x4_t a, int32x4_t b); // VTST.32 q0, q0, q0 +uint8x16_t vtstq_u8(uint8x16_t a, uint8x16_t b); // VTST.8 q0, q0, q0 +uint16x8_t vtstq_u16(uint16x8_t a, uint16x8_t b); // VTST.16 q0, q0, q0 +uint32x4_t vtstq_u32(uint32x4_t a, uint32x4_t b); // VTST.32 q0, q0, q0 +uint8x16_t vtstq_p8(poly8x16_t a, poly8x16_t b); // VTST.8 q0, q0, q0 +//Absolute difference +//Absolute difference between the arguments: Vr[i] = | Va[i] - Vb[i] | +int8x8_t vabd_s8(int8x8_t a, int8x8_t b); // VABD.S8 d0,d0,d0 +int16x4_t vabd_s16(int16x4_t a, int16x4_t b); // VABD.S16 d0,d0,d0 +int32x2_t vabd_s32(int32x2_t a, int32x2_t b); // VABD.S32 d0,d0,d0 +uint8x8_t vabd_u8(uint8x8_t a, uint8x8_t b); // VABD.U8 d0,d0,d0 +uint16x4_t vabd_u16(uint16x4_t a, uint16x4_t b); // VABD.U16 d0,d0,d0 +uint32x2_t vabd_u32(uint32x2_t a, uint32x2_t b); // VABD.U32 d0,d0,d0 +float32x2_t vabd_f32(float32x2_t a, float32x2_t b); // VABD.F32 d0,d0,d0 +int8x16_t vabdq_s8(int8x16_t a, int8x16_t b); // VABD.S8 q0,q0,q0 +int16x8_t vabdq_s16(int16x8_t a, int16x8_t b); // VABD.S16 q0,q0,q0 +int32x4_t vabdq_s32(int32x4_t a, int32x4_t b); // VABD.S32 q0,q0,q0 +uint8x16_t vabdq_u8(uint8x16_t a, uint8x16_t b); // VABD.U8 q0,q0,q0 +uint16x8_t vabdq_u16(uint16x8_t a, uint16x8_t b); // VABD.U16 q0,q0,q0 +uint32x4_t vabdq_u32(uint32x4_t a, uint32x4_t b); // VABD.U32 q0,q0,q0 +float32x4_t vabdq_f32(float32x4_t a, float32x4_t b); // VABD.F32 q0,q0,q0 +//Absolute difference - long +int16x8_t vabdl_s8(int8x8_t a, int8x8_t b); // VABDL.S8 q0,d0,d0 +int32x4_t vabdl_s16(int16x4_t a, int16x4_t b); // VABDL.S16 q0,d0,d0 +int64x2_t vabdl_s32(int32x2_t a, int32x2_t b); // VABDL.S32 q0,d0,d0 +uint16x8_t vabdl_u8(uint8x8_t a, uint8x8_t b); // VABDL.U8 q0,d0,d0 +uint32x4_t vabdl_u16(uint16x4_t a, uint16x4_t b); // VABDL.U16 q0,d0,d0 +uint64x2_t vabdl_u32(uint32x2_t a, uint32x2_t b); // VABDL.U32 q0,d0,d0 +//Absolute difference and accumulate: Vr[i] = Va[i] + | Vb[i] - Vc[i] | +int8x8_t vaba_s8(int8x8_t a, int8x8_t b, int8x8_t c); // VABA.S8 d0,d0,d0 +int16x4_t vaba_s16(int16x4_t a, int16x4_t b, int16x4_t c); // VABA.S16 d0,d0,d0 +int32x2_t vaba_s32(int32x2_t a, int32x2_t b, int32x2_t c); // VABA.S32 d0,d0,d0 +uint8x8_t vaba_u8(uint8x8_t a, uint8x8_t b, uint8x8_t c); // VABA.U8 d0,d0,d0 +uint16x4_t vaba_u16(uint16x4_t a, uint16x4_t b, uint16x4_t c); // VABA.U16 d0,d0,d0 +uint32x2_t vaba_u32(uint32x2_t a, uint32x2_t b, uint32x2_t c); // VABA.U32 d0,d0,d0 +int8x16_t vabaq_s8(int8x16_t a, int8x16_t b, int8x16_t c); // VABA.S8 q0,q0,q0 +int16x8_t vabaq_s16(int16x8_t a, int16x8_t b, int16x8_t c); // VABA.S16 q0,q0,q0 +int32x4_t vabaq_s32(int32x4_t a, int32x4_t b, int32x4_t c); // VABA.S32 q0,q0,q0 +uint8x16_t vabaq_u8(uint8x16_t a, uint8x16_t b, uint8x16_t c); // VABA.U8 q0,q0,q0 +uint16x8_t vabaq_u16(uint16x8_t a, uint16x8_t b, uint16x8_t c); // VABA.U16 q0,q0,q0 +uint32x4_t vabaq_u32(uint32x4_t a, uint32x4_t b, uint32x4_t c); // VABA.U32 q0,q0,q0 +//Absolute difference and accumulate - long +int16x8_t vabal_s8(int16x8_t a, int8x8_t b, int8x8_t c); // VABAL.S8 q0,d0,d0 +int32x4_t vabal_s16(int32x4_t a, int16x4_t b, int16x4_t c); // VABAL.S16 q0,d0,d0 +int64x2_t vabal_s32(int64x2_t a, int32x2_t b, int32x2_t c); // VABAL.S32 q0,d0,d0 +uint16x8_t vabal_u8(uint16x8_t a, uint8x8_t b, uint8x8_t c); // VABAL.U8 q0,d0,d0 +uint32x4_t vabal_u16(uint32x4_t a, uint16x4_t b, uint16x4_t c); // VABAL.U16 q0,d0,d0 +uint64x2_t vabal_u32(uint64x2_t a, uint32x2_t b, uint32x2_t c); // VABAL.U32 q0,d0,d0 +//Max/Min +//vmax -> Vr[i] := (Va[i] >= Vb[i]) ? Va[i] : Vb[i] +int8x8_t vmax_s8(int8x8_t a, int8x8_t b); // VMAX.S8 d0,d0,d0 +int16x4_t vmax_s16(int16x4_t a, int16x4_t b); // VMAX.S16 d0,d0,d0 +int32x2_t vmax_s32(int32x2_t a, int32x2_t b); // VMAX.S32 d0,d0,d0 +uint8x8_t vmax_u8(uint8x8_t a, uint8x8_t b); // VMAX.U8 d0,d0,d0 +uint16x4_t vmax_u16(uint16x4_t a, uint16x4_t b); // VMAX.U16 d0,d0,d0 +uint32x2_t vmax_u32(uint32x2_t a, uint32x2_t b); // VMAX.U32 d0,d0,d0 +float32x2_t vmax_f32(float32x2_t a, float32x2_t b); // VMAX.F32 d0,d0,d0 +int8x16_t vmaxq_s8(int8x16_t a, int8x16_t b); // VMAX.S8 q0,q0,q0 +int16x8_t vmaxq_s16(int16x8_t a, int16x8_t b); // VMAX.S16 q0,q0,q0 +int32x4_t vmaxq_s32(int32x4_t a, int32x4_t b); // VMAX.S32 q0,q0,q0 +uint8x16_t vmaxq_u8(uint8x16_t a, uint8x16_t b); // VMAX.U8 q0,q0,q0 +uint16x8_t vmaxq_u16(uint16x8_t a, uint16x8_t b); // VMAX.U16 q0,q0,q0 +uint32x4_t vmaxq_u32(uint32x4_t a, uint32x4_t b); // VMAX.U32 q0,q0,q0 +float32x4_t vmaxq_f32(float32x4_t a, float32x4_t b); // VMAX.F32 q0,q0,q0 +//vmin -> Vr[i] := (Va[i] >= Vb[i]) ? Vb[i] : Va[i] +int8x8_t vmin_s8(int8x8_t a, int8x8_t b); // VMIN.S8 d0,d0,d0 +int16x4_t vmin_s16(int16x4_t a, int16x4_t b); // VMIN.S16 d0,d0,d0 +int32x2_t vmin_s32(int32x2_t a, int32x2_t b); // VMIN.S32 d0,d0,d0 +uint8x8_t vmin_u8(uint8x8_t a, uint8x8_t b); // VMIN.U8 d0,d0,d0 +uint16x4_t vmin_u16(uint16x4_t a, uint16x4_t b); // VMIN.U16 d0,d0,d0 +uint32x2_t vmin_u32(uint32x2_t a, uint32x2_t b); // VMIN.U32 d0,d0,d0 +float32x2_t vmin_f32(float32x2_t a, float32x2_t b); // VMIN.F32 d0,d0,d0 +int8x16_t vminq_s8(int8x16_t a, int8x16_t b); // VMIN.S8 q0,q0,q0 +int16x8_t vminq_s16(int16x8_t a, int16x8_t b); // VMIN.S16 q0,q0,q0 +int32x4_t vminq_s32(int32x4_t a, int32x4_t b); // VMIN.S32 q0,q0,q0 +uint8x16_t vminq_u8(uint8x16_t a, uint8x16_t b); // VMIN.U8 q0,q0,q0 +uint16x8_t vminq_u16(uint16x8_t a, uint16x8_t b); // VMIN.U16 q0,q0,q0 +uint32x4_t vminq_u32(uint32x4_t a, uint32x4_t b); // VMIN.U32 q0,q0,q0 +float32x4_t vminq_f32(float32x4_t a, float32x4_t b); // VMIN.F32 q0,q0,q0 +//Pairwise addition +//Pairwise add +int8x8_t vpadd_s8(int8x8_t a, int8x8_t b); // VPADD.I8 d0,d0,d0 +int16x4_t vpadd_s16(int16x4_t a, int16x4_t b); // VPADD.I16 d0,d0,d0 +int32x2_t vpadd_s32(int32x2_t a, int32x2_t b); // VPADD.I32 d0,d0,d0 +uint8x8_t vpadd_u8(uint8x8_t a, uint8x8_t b); // VPADD.I8 d0,d0,d0 +uint16x4_t vpadd_u16(uint16x4_t a, uint16x4_t b); // VPADD.I16 d0,d0,d0 +uint32x2_t vpadd_u32(uint32x2_t a, uint32x2_t b); // VPADD.I32 d0,d0,d0 +float32x2_t vpadd_f32(float32x2_t a, float32x2_t b); // VPADD.F32 d0,d0,d0 +//Long pairwise add +int16x4_t vpaddl_s8(int8x8_t a); // VPADDL.S8 d0,d0 +int32x2_t vpaddl_s16(int16x4_t a); // VPADDL.S16 d0,d0 +int64x1_t vpaddl_s32(int32x2_t a); // VPADDL.S32 d0,d0 +uint16x4_t vpaddl_u8(uint8x8_t a); // VPADDL.U8 d0,d0 +uint32x2_t vpaddl_u16(uint16x4_t a); // VPADDL.U16 d0,d0 +uint64x1_t vpaddl_u32(uint32x2_t a); // VPADDL.U32 d0,d0 +int16x8_t vpaddlq_s8(int8x16_t a); // VPADDL.S8 q0,q0 +int32x4_t vpaddlq_s16(int16x8_t a); // VPADDL.S16 q0,q0 +int64x2_t vpaddlq_s32(int32x4_t a); // VPADDL.S32 q0,q0 +uint16x8_t vpaddlq_u8(uint8x16_t a); // VPADDL.U8 q0,q0 +uint32x4_t vpaddlq_u16(uint16x8_t a); // VPADDL.U16 q0,q0 +uint64x2_t vpaddlq_u32(uint32x4_t a); // VPADDL.U32 q0,q0 +//Long pairwise add and accumulate +int16x4_t vpadal_s8(int16x4_t a, int8x8_t b); // VPADAL.S8 d0,d0 +int32x2_t vpadal_s16(int32x2_t a, int16x4_t b); // VPADAL.S16 d0,d0 +int64x1_t vpadal_s32(int64x1_t a, int32x2_t b); // VPADAL.S32 d0,d0 +uint16x4_t vpadal_u8(uint16x4_t a, uint8x8_t b); // VPADAL.U8 d0,d0 +uint32x2_t vpadal_u16(uint32x2_t a, uint16x4_t b); // VPADAL.U16 d0,d0 +uint64x1_t vpadal_u32(uint64x1_t a, uint32x2_t b); // VPADAL.U32 d0,d0 +int16x8_t vpadalq_s8(int16x8_t a, int8x16_t b); // VPADAL.S8 q0,q0 +int32x4_t vpadalq_s16(int32x4_t a, int16x8_t b); // VPADAL.S16 q0,q0 +int64x2_t vpadalq_s32(int64x2_t a, int32x4_t b); // VPADAL.S32 q0,q0 +uint16x8_t vpadalq_u8(uint16x8_t a, uint8x16_t b); // VPADAL.U8 q0,q0 +uint32x4_t vpadalq_u16(uint32x4_t a, uint16x8_t b); // VPADAL.U16 q0,q0 +uint64x2_t vpadalq_u32(uint64x2_t a, uint32x4_t b); // VPADAL.U32 q0,q0 +//Folding maximum vpmax -> takes maximum of adjacent pairs +int8x8_t vpmax_s8(int8x8_t a, int8x8_t b); // VPMAX.S8 d0,d0,d0 +int16x4_t vpmax_s16(int16x4_t a, int16x4_t b); // VPMAX.S16 d0,d0,d0 +int32x2_t vpmax_s32(int32x2_t a, int32x2_t b); // VPMAX.S32 d0,d0,d0 +uint8x8_t vpmax_u8(uint8x8_t a, uint8x8_t b); // VPMAX.U8 d0,d0,d0 +uint16x4_t vpmax_u16(uint16x4_t a, uint16x4_t b); // VPMAX.U16 d0,d0,d0 +uint32x2_t vpmax_u32(uint32x2_t a, uint32x2_t b); // VPMAX.U32 d0,d0,d0 +float32x2_t vpmax_f32(float32x2_t a, float32x2_t b); // VPMAX.F32 d0,d0,d0 +//Folding minimum vpmin -> takes minimum of adjacent pairs +int8x8_t vpmin_s8(int8x8_t a, int8x8_t b); // VPMIN.S8 d0,d0,d0 +int16x4_t vpmin_s16(int16x4_t a, int16x4_t b); // VPMIN.S16 d0,d0,d0 +int32x2_t vpmin_s32(int32x2_t a, int32x2_t b); // VPMIN.S32 d0,d0,d0 +uint8x8_t vpmin_u8(uint8x8_t a, uint8x8_t b); // VPMIN.U8 d0,d0,d0 +uint16x4_t vpmin_u16(uint16x4_t a, uint16x4_t b); // VPMIN.U16 d0,d0,d0 +uint32x2_t vpmin_u32(uint32x2_t a, uint32x2_t b); // VPMIN.U32 d0,d0,d0 +float32x2_t vpmin_f32(float32x2_t a, float32x2_t b); // VPMIN.F32 d0,d0,d0 +//Reciprocal/Sqrt +float32x2_t vrecps_f32(float32x2_t a, float32x2_t b); // VRECPS.F32 d0, d0, d0 +float32x4_t vrecpsq_f32(float32x4_t a, float32x4_t b); // VRECPS.F32 q0, q0, q0 +float32x2_t vrsqrts_f32(float32x2_t a, float32x2_t b); // VRSQRTS.F32 d0, d0, d0 +float32x4_t vrsqrtsq_f32(float32x4_t a, float32x4_t b); // VRSQRTS.F32 q0, q0, q0 +//Shifts by signed variable +//Vector shift left: Vr[i] := Va[i] << Vb[i] (negative values shift right) +int8x8_t vshl_s8(int8x8_t a, int8x8_t b); // VSHL.S8 d0,d0,d0 +int16x4_t vshl_s16(int16x4_t a, int16x4_t b); // VSHL.S16 d0,d0,d0 +int32x2_t vshl_s32(int32x2_t a, int32x2_t b); // VSHL.S32 d0,d0,d0 +int64x1_t vshl_s64(int64x1_t a, int64x1_t b); // VSHL.S64 d0,d0,d0 +uint8x8_t vshl_u8(uint8x8_t a, int8x8_t b); // VSHL.U8 d0,d0,d0 +uint16x4_t vshl_u16(uint16x4_t a, int16x4_t b); // VSHL.U16 d0,d0,d0 +uint32x2_t vshl_u32(uint32x2_t a, int32x2_t b); // VSHL.U32 d0,d0,d0 +uint64x1_t vshl_u64(uint64x1_t a, int64x1_t b); // VSHL.U64 d0,d0,d0 +int8x16_t vshlq_s8(int8x16_t a, int8x16_t b); // VSHL.S8 q0,q0,q0 +int16x8_t vshlq_s16(int16x8_t a, int16x8_t b); // VSHL.S16 q0,q0,q0 +int32x4_t vshlq_s32(int32x4_t a, int32x4_t b); // VSHL.S32 q0,q0,q0 +int64x2_t vshlq_s64(int64x2_t a, int64x2_t b); // VSHL.S64 q0,q0,q0 +uint8x16_t vshlq_u8(uint8x16_t a, int8x16_t b); // VSHL.U8 q0,q0,q0 +uint16x8_t vshlq_u16(uint16x8_t a, int16x8_t b); // VSHL.U16 q0,q0,q0 +uint32x4_t vshlq_u32(uint32x4_t a, int32x4_t b); // VSHL.U32 q0,q0,q0 +uint64x2_t vshlq_u64(uint64x2_t a, int64x2_t b); // VSHL.U64 q0,q0,q0 +//Vector saturating shift left: (negative values shift right) +int8x8_t vqshl_s8(int8x8_t a, int8x8_t b); // VQSHL.S8 d0,d0,d0 +int16x4_t vqshl_s16(int16x4_t a, int16x4_t b); // VQSHL.S16 d0,d0,d0 +int32x2_t vqshl_s32(int32x2_t a, int32x2_t b); // VQSHL.S32 d0,d0,d0 +int64x1_t vqshl_s64(int64x1_t a, int64x1_t b); // VQSHL.S64 d0,d0,d0 +uint8x8_t vqshl_u8(uint8x8_t a, int8x8_t b); // VQSHL.U8 d0,d0,d0 +uint16x4_t vqshl_u16(uint16x4_t a, int16x4_t b); // VQSHL.U16 d0,d0,d0 +uint32x2_t vqshl_u32(uint32x2_t a, int32x2_t b); // VQSHL.U32 d0,d0,d0 +uint64x1_t vqshl_u64(uint64x1_t a, int64x1_t b); // VQSHL.U64 d0,d0,d0 +int8x16_t vqshlq_s8(int8x16_t a, int8x16_t b); // VQSHL.S8 q0,q0,q0 +int16x8_t vqshlq_s16(int16x8_t a, int16x8_t b); // VQSHL.S16 q0,q0,q0 +int32x4_t vqshlq_s32(int32x4_t a, int32x4_t b); // VQSHL.S32 q0,q0,q0 +int64x2_t vqshlq_s64(int64x2_t a, int64x2_t b); // VQSHL.S64 q0,q0,q0 +uint8x16_t vqshlq_u8(uint8x16_t a, int8x16_t b); // VQSHL.U8 q0,q0,q0 +uint16x8_t vqshlq_u16(uint16x8_t a, int16x8_t b); // VQSHL.U16 q0,q0,q0 +uint32x4_t vqshlq_u32(uint32x4_t a, int32x4_t b); // VQSHL.U32 q0,q0,q0 +uint64x2_t vqshlq_u64(uint64x2_t a, int64x2_t b); // VQSHL.U64 q0,q0,q0 +//Vector rounding shift left: (negative values shift right) +int8x8_t vrshl_s8(int8x8_t a, int8x8_t b); // VRSHL.S8 d0,d0,d0 +int16x4_t vrshl_s16(int16x4_t a, int16x4_t b); // VRSHL.S16 d0,d0,d0 +int32x2_t vrshl_s32(int32x2_t a, int32x2_t b); // VRSHL.S32 d0,d0,d0 +int64x1_t vrshl_s64(int64x1_t a, int64x1_t b); // VRSHL.S64 d0,d0,d0 +uint8x8_t vrshl_u8(uint8x8_t a, int8x8_t b); // VRSHL.U8 d0,d0,d0 +uint16x4_t vrshl_u16(uint16x4_t a, int16x4_t b); // VRSHL.U16 d0,d0,d0 +uint32x2_t vrshl_u32(uint32x2_t a, int32x2_t b); // VRSHL.U32 d0,d0,d0 +uint64x1_t vrshl_u64(uint64x1_t a, int64x1_t b); // VRSHL.U64 d0,d0,d0 +int8x16_t vrshlq_s8(int8x16_t a, int8x16_t b); // VRSHL.S8 q0,q0,q0 +int16x8_t vrshlq_s16(int16x8_t a, int16x8_t b); // VRSHL.S16 q0,q0,q0 +int32x4_t vrshlq_s32(int32x4_t a, int32x4_t b); // VRSHL.S32 q0,q0,q0 +int64x2_t vrshlq_s64(int64x2_t a, int64x2_t b); // VRSHL.S64 q0,q0,q0 +uint8x16_t vrshlq_u8(uint8x16_t a, int8x16_t b); // VRSHL.U8 q0,q0,q0 +uint16x8_t vrshlq_u16(uint16x8_t a, int16x8_t b); // VRSHL.U16 q0,q0,q0 +uint32x4_t vrshlq_u32(uint32x4_t a, int32x4_t b); // VRSHL.U32 q0,q0,q0 +uint64x2_t vrshlq_u64(uint64x2_t a, int64x2_t b); // VRSHL.U64 q0,q0,q0 +//Vector saturating rounding shift left: (negative values shift right) +int8x8_t vqrshl_s8(int8x8_t a, int8x8_t b); // VQRSHL.S8 d0,d0,d0 +int16x4_t vqrshl_s16(int16x4_t a, int16x4_t b); // VQRSHL.S16 d0,d0,d0 +int32x2_t vqrshl_s32(int32x2_t a, int32x2_t b); // VQRSHL.S32 d0,d0,d0 +int64x1_t vqrshl_s64(int64x1_t a, int64x1_t b); // VQRSHL.S64 d0,d0,d0 +uint8x8_t vqrshl_u8(uint8x8_t a, int8x8_t b); // VQRSHL.U8 d0,d0,d0 +uint16x4_t vqrshl_u16(uint16x4_t a, int16x4_t b); // VQRSHL.U16 d0,d0,d0 +uint32x2_t vqrshl_u32(uint32x2_t a, int32x2_t b); // VQRSHL.U32 d0,d0,d0 +uint64x1_t vqrshl_u64(uint64x1_t a, int64x1_t b); // VQRSHL.U64 d0,d0,d0 +int8x16_t vqrshlq_s8(int8x16_t a, int8x16_t b); // VQRSHL.S8 q0,q0,q0 +int16x8_t vqrshlq_s16(int16x8_t a, int16x8_t b); // VQRSHL.S16 q0,q0,q0 +int32x4_t vqrshlq_s32(int32x4_t a, int32x4_t b); // VQRSHL.S32 q0,q0,q0 +int64x2_t vqrshlq_s64(int64x2_t a, int64x2_t b); // VQRSHL.S64 q0,q0,q0 +uint8x16_t vqrshlq_u8(uint8x16_t a, int8x16_t b); // VQRSHL.U8 q0,q0,q0 +uint16x8_t vqrshlq_u16(uint16x8_t a, int16x8_t b); // VQRSHL.U16 q0,q0,q0 +uint32x4_t vqrshlq_u32(uint32x4_t a, int32x4_t b); // VQRSHL.U32 q0,q0,q0 +uint64x2_t vqrshlq_u64(uint64x2_t a, int64x2_t b); // VQRSHL.U64 q0,q0,q0 +//Shifts by a constant +//Vector shift right by constant +int8x8_t vshr_n_s8(int8x8_t a, __constrange(1,8) int b); // VSHR.S8 d0,d0,#8 +int16x4_t vshr_n_s16(int16x4_t a, __constrange(1,16) int b); // VSHR.S16 d0,d0,#16 +int32x2_t vshr_n_s32(int32x2_t a, __constrange(1,32) int b); // VSHR.S32 d0,d0,#32 +int64x1_t vshr_n_s64(int64x1_t a, __constrange(1,64) int b); // VSHR.S64 d0,d0,#64 +uint8x8_t vshr_n_u8(uint8x8_t a, __constrange(1,8) int b); // VSHR.U8 d0,d0,#8 +uint16x4_t vshr_n_u16(uint16x4_t a, __constrange(1,16) int b); // VSHR.U16 d0,d0,#16 +uint32x2_t vshr_n_u32(uint32x2_t a, __constrange(1,32) int b); // VSHR.U32 d0,d0,#32 +uint64x1_t vshr_n_u64(uint64x1_t a, __constrange(1,64) int b); // VSHR.U64 d0,d0,#64 +int8x16_t vshrq_n_s8(int8x16_t a, __constrange(1,8) int b); // VSHR.S8 q0,q0,#8 +int16x8_t vshrq_n_s16(int16x8_t a, __constrange(1,16) int b); // VSHR.S16 q0,q0,#16 +int32x4_t vshrq_n_s32(int32x4_t a, __constrange(1,32) int b); // VSHR.S32 q0,q0,#32 +int64x2_t vshrq_n_s64(int64x2_t a, __constrange(1,64) int b); // VSHR.S64 q0,q0,#64 +uint8x16_t vshrq_n_u8(uint8x16_t a, __constrange(1,8) int b); // VSHR.U8 q0,q0,#8 +uint16x8_t vshrq_n_u16(uint16x8_t a, __constrange(1,16) int b); // VSHR.U16 q0,q0,#16 +uint32x4_t vshrq_n_u32(uint32x4_t a, __constrange(1,32) int b); // VSHR.U32 q0,q0,#32 +uint64x2_t vshrq_n_u64(uint64x2_t a, __constrange(1,64) int b); // VSHR.U64 q0,q0,#64 +//Vector shift left by constant +int8x8_t vshl_n_s8(int8x8_t a, __constrange(0,7) int b); // VSHL.I8 d0,d0,#0 +int16x4_t vshl_n_s16(int16x4_t a, __constrange(0,15) int b); // VSHL.I16 d0,d0,#0 +int32x2_t vshl_n_s32(int32x2_t a, __constrange(0,31) int b); // VSHL.I32 d0,d0,#0 +int64x1_t vshl_n_s64(int64x1_t a, __constrange(0,63) int b); // VSHL.I64 d0,d0,#0 +uint8x8_t vshl_n_u8(uint8x8_t a, __constrange(0,7) int b); // VSHL.I8 d0,d0,#0 +uint16x4_t vshl_n_u16(uint16x4_t a, __constrange(0,15) int b); // VSHL.I16 d0,d0,#0 +uint32x2_t vshl_n_u32(uint32x2_t a, __constrange(0,31) int b); // VSHL.I32 d0,d0,#0 +uint64x1_t vshl_n_u64(uint64x1_t a, __constrange(0,63) int b); // VSHL.I64 d0,d0,#0 +int8x16_t vshlq_n_s8(int8x16_t a, __constrange(0,7) int b); // VSHL.I8 q0,q0,#0 +int16x8_t vshlq_n_s16(int16x8_t a, __constrange(0,15) int b); // VSHL.I16 q0,q0,#0 +int32x4_t vshlq_n_s32(int32x4_t a, __constrange(0,31) int b); // VSHL.I32 q0,q0,#0 +int64x2_t vshlq_n_s64(int64x2_t a, __constrange(0,63) int b); // VSHL.I64 q0,q0,#0 +uint8x16_t vshlq_n_u8(uint8x16_t a, __constrange(0,7) int b); // VSHL.I8 q0,q0,#0 +uint16x8_t vshlq_n_u16(uint16x8_t a, __constrange(0,15) int b); // VSHL.I16 q0,q0,#0 +uint32x4_t vshlq_n_u32(uint32x4_t a, __constrange(0,31) int b); // VSHL.I32 q0,q0,#0 +uint64x2_t vshlq_n_u64(uint64x2_t a, __constrange(0,63) int b); // VSHL.I64 q0,q0,#0 +//Vector rounding shift right by constant +int8x8_t vrshr_n_s8(int8x8_t a, __constrange(1,8) int b); // VRSHR.S8 d0,d0,#8 +int16x4_t vrshr_n_s16(int16x4_t a, __constrange(1,16) int b); // VRSHR.S16 d0,d0,#16 +int32x2_t vrshr_n_s32(int32x2_t a, __constrange(1,32) int b); // VRSHR.S32 d0,d0,#32 +int64x1_t vrshr_n_s64(int64x1_t a, __constrange(1,64) int b); // VRSHR.S64 d0,d0,#64 +uint8x8_t vrshr_n_u8(uint8x8_t a, __constrange(1,8) int b); // VRSHR.U8 d0,d0,#8 +uint16x4_t vrshr_n_u16(uint16x4_t a, __constrange(1,16) int b); // VRSHR.U16 d0,d0,#16 +uint32x2_t vrshr_n_u32(uint32x2_t a, __constrange(1,32) int b); // VRSHR.U32 d0,d0,#32 +uint64x1_t vrshr_n_u64(uint64x1_t a, __constrange(1,64) int b); // VRSHR.U64 d0,d0,#64 +int8x16_t vrshrq_n_s8(int8x16_t a, __constrange(1,8) int b); // VRSHR.S8 q0,q0,#8 +int16x8_t vrshrq_n_s16(int16x8_t a, __constrange(1,16) int b); // VRSHR.S16 q0,q0,#16 +int32x4_t vrshrq_n_s32(int32x4_t a, __constrange(1,32) int b); // VRSHR.S32 q0,q0,#32 +int64x2_t vrshrq_n_s64(int64x2_t a, __constrange(1,64) int b); // VRSHR.S64 q0,q0,#64 +uint8x16_t vrshrq_n_u8(uint8x16_t a, __constrange(1,8) int b); // VRSHR.U8 q0,q0,#8 +uint16x8_t vrshrq_n_u16(uint16x8_t a, __constrange(1,16) int b); // VRSHR.U16 q0,q0,#16 +uint32x4_t vrshrq_n_u32(uint32x4_t a, __constrange(1,32) int b); // VRSHR.U32 q0,q0,#32 +uint64x2_t vrshrq_n_u64(uint64x2_t a, __constrange(1,64) int b); // VRSHR.U64 q0,q0,#64 +//Vector shift right by constant and accumulate +int8x8_t vsra_n_s8(int8x8_t a, int8x8_t b, __constrange(1,8) int c); // VSRA.S8 d0,d0,#8 +int16x4_t vsra_n_s16(int16x4_t a, int16x4_t b, __constrange(1,16) int c); // VSRA.S16 d0,d0,#16 +int32x2_t vsra_n_s32(int32x2_t a, int32x2_t b, __constrange(1,32) int c); // VSRA.S32 d0,d0,#32 +int64x1_t vsra_n_s64(int64x1_t a, int64x1_t b, __constrange(1,64) int c); // VSRA.S64 d0,d0,#64 +uint8x8_t vsra_n_u8(uint8x8_t a, uint8x8_t b, __constrange(1,8) int c); // VSRA.U8 d0,d0,#8 +uint16x4_t vsra_n_u16(uint16x4_t a, uint16x4_t b, __constrange(1,16) int c); // VSRA.U16 d0,d0,#16 +uint32x2_t vsra_n_u32(uint32x2_t a, uint32x2_t b, __constrange(1,32) int c); // VSRA.U32 d0,d0,#32 +uint64x1_t vsra_n_u64(uint64x1_t a, uint64x1_t b, __constrange(1,64) int c); // VSRA.U64 d0,d0,#64 +int8x16_t vsraq_n_s8(int8x16_t a, int8x16_t b, __constrange(1,8) int c); // VSRA.S8 q0,q0,#8 +int16x8_t vsraq_n_s16(int16x8_t a, int16x8_t b, __constrange(1,16) int c); // VSRA.S16 q0,q0,#16 +int32x4_t vsraq_n_s32(int32x4_t a, int32x4_t b, __constrange(1,32) int c); // VSRA.S32 q0,q0,#32 +int64x2_t vsraq_n_s64(int64x2_t a, int64x2_t b, __constrange(1,64) int c); // VSRA.S64 q0,q0,#64 +uint8x16_t vsraq_n_u8(uint8x16_t a, uint8x16_t b, __constrange(1,8) int c); // VSRA.U8 q0,q0,#8 +uint16x8_t vsraq_n_u16(uint16x8_t a, uint16x8_t b, __constrange(1,16) int c); // VSRA.U16 q0,q0,#16 +uint32x4_t vsraq_n_u32(uint32x4_t a, uint32x4_t b, __constrange(1,32) int c); // VSRA.U32 q0,q0,#32 +uint64x2_t vsraq_n_u64(uint64x2_t a, uint64x2_t b, __constrange(1,64) int c); // VSRA.U64 q0,q0,#64 +//Vector rounding shift right by constant and accumulate +int8x8_t vrsra_n_s8(int8x8_t a, int8x8_t b, __constrange(1,8) int c); // VRSRA.S8 d0,d0,#8 +int16x4_t vrsra_n_s16(int16x4_t a, int16x4_t b, __constrange(1,16) int c); // VRSRA.S16 d0,d0,#16 +int32x2_t vrsra_n_s32(int32x2_t a, int32x2_t b, __constrange(1,32) int c); // VRSRA.S32 d0,d0,#32 +int64x1_t vrsra_n_s64(int64x1_t a, int64x1_t b, __constrange(1,64) int c); // VRSRA.S64 d0,d0,#64 +uint8x8_t vrsra_n_u8(uint8x8_t a, uint8x8_t b, __constrange(1,8) int c); // VRSRA.U8 d0,d0,#8 +uint16x4_t vrsra_n_u16(uint16x4_t a, uint16x4_t b, __constrange(1,16) int c); // VRSRA.U16 d0,d0,#16 +uint32x2_t vrsra_n_u32(uint32x2_t a, uint32x2_t b, __constrange(1,32) int c); // VRSRA.U32 d0,d0,#32 +uint64x1_t vrsra_n_u64(uint64x1_t a, uint64x1_t b, __constrange(1,64) int c); // VRSRA.U64 d0,d0,#64 +int8x16_t vrsraq_n_s8(int8x16_t a, int8x16_t b, __constrange(1,8) int c); // VRSRA.S8 q0,q0,#8 +int16x8_t vrsraq_n_s16(int16x8_t a, int16x8_t b, __constrange(1,16) int c); // VRSRA.S16 q0,q0,#16 +int32x4_t vrsraq_n_s32(int32x4_t a, int32x4_t b, __constrange(1,32) int c); // VRSRA.S32 q0,q0,#32 +int64x2_t vrsraq_n_s64(int64x2_t a, int64x2_t b, __constrange(1,64) int c); // VRSRA.S64 q0,q0,#64 +uint8x16_t vrsraq_n_u8(uint8x16_t a, uint8x16_t b, __constrange(1,8) int c); // VRSRA.U8 q0,q0,#8 +uint16x8_t vrsraq_n_u16(uint16x8_t a, uint16x8_t b, __constrange(1,16) int c); // VRSRA.U16 q0,q0,#16 +uint32x4_t vrsraq_n_u32(uint32x4_t a, uint32x4_t b, __constrange(1,32) int c); // VRSRA.U32 q0,q0,#32 +uint64x2_t vrsraq_n_u64(uint64x2_t a, uint64x2_t b, __constrange(1,64) int c); // VRSRA.U64 q0,q0,#64 +//Vector saturating shift left by constant +int8x8_t vqshl_n_s8(int8x8_t a, __constrange(0,7) int b); // VQSHL.S8 d0,d0,#0 +int16x4_t vqshl_n_s16(int16x4_t a, __constrange(0,15) int b); // VQSHL.S16 d0,d0,#0 +int32x2_t vqshl_n_s32(int32x2_t a, __constrange(0,31) int b); // VQSHL.S32 d0,d0,#0 +int64x1_t vqshl_n_s64(int64x1_t a, __constrange(0,63) int b); // VQSHL.S64 d0,d0,#0 +uint8x8_t vqshl_n_u8(uint8x8_t a, __constrange(0,7) int b); // VQSHL.U8 d0,d0,#0 +uint16x4_t vqshl_n_u16(uint16x4_t a, __constrange(0,15) int b); // VQSHL.U16 d0,d0,#0 +uint32x2_t vqshl_n_u32(uint32x2_t a, __constrange(0,31) int b); // VQSHL.U32 d0,d0,#0 +uint64x1_t vqshl_n_u64(uint64x1_t a, __constrange(0,63) int b); // VQSHL.U64 d0,d0,#0 +int8x16_t vqshlq_n_s8(int8x16_t a, __constrange(0,7) int b); // VQSHL.S8 q0,q0,#0 +int16x8_t vqshlq_n_s16(int16x8_t a, __constrange(0,15) int b); // VQSHL.S16 q0,q0,#0 +int32x4_t vqshlq_n_s32(int32x4_t a, __constrange(0,31) int b); // VQSHL.S32 q0,q0,#0 +int64x2_t vqshlq_n_s64(int64x2_t a, __constrange(0,63) int b); // VQSHL.S64 q0,q0,#0 +uint8x16_t vqshlq_n_u8(uint8x16_t a, __constrange(0,7) int b); // VQSHL.U8 q0,q0,#0 +uint16x8_t vqshlq_n_u16(uint16x8_t a, __constrange(0,15) int b); // VQSHL.U16 q0,q0,#0 +uint32x4_t vqshlq_n_u32(uint32x4_t a, __constrange(0,31) int b); // VQSHL.U32 q0,q0,#0 +uint64x2_t vqshlq_n_u64(uint64x2_t a, __constrange(0,63) int b); // VQSHL.U64 q0,q0,#0 +//Vector signed->unsigned saturating shift left by constant +uint8x8_t vqshlu_n_s8(int8x8_t a, __constrange(0,7) int b); // VQSHLU.S8 d0,d0,#0 +uint16x4_t vqshlu_n_s16(int16x4_t a, __constrange(0,15) int b); // VQSHLU.S16 d0,d0,#0 +uint32x2_t vqshlu_n_s32(int32x2_t a, __constrange(0,31) int b); // VQSHLU.S32 d0,d0,#0 +uint64x1_t vqshlu_n_s64(int64x1_t a, __constrange(0,63) int b); // VQSHLU.S64 d0,d0,#0 +uint8x16_t vqshluq_n_s8(int8x16_t a, __constrange(0,7) int b); // VQSHLU.S8 q0,q0,#0 +uint16x8_t vqshluq_n_s16(int16x8_t a, __constrange(0,15) int b); // VQSHLU.S16 q0,q0,#0 +uint32x4_t vqshluq_n_s32(int32x4_t a, __constrange(0,31) int b); // VQSHLU.S32 q0,q0,#0 +uint64x2_t vqshluq_n_s64(int64x2_t a, __constrange(0,63) int b); // VQSHLU.S64 q0,q0,#0 +//Vector narrowing shift right by constant +int8x8_t vshrn_n_s16(int16x8_t a, __constrange(1,8) int b); // VSHRN.I16 d0,q0,#8 +int16x4_t vshrn_n_s32(int32x4_t a, __constrange(1,16) int b); // VSHRN.I32 d0,q0,#16 +int32x2_t vshrn_n_s64(int64x2_t a, __constrange(1,32) int b); // VSHRN.I64 d0,q0,#32 +uint8x8_t vshrn_n_u16(uint16x8_t a, __constrange(1,8) int b); // VSHRN.I16 d0,q0,#8 +uint16x4_t vshrn_n_u32(uint32x4_t a, __constrange(1,16) int b); // VSHRN.I32 d0,q0,#16 +uint32x2_t vshrn_n_u64(uint64x2_t a, __constrange(1,32) int b); // VSHRN.I64 d0,q0,#32 +//Vector signed->unsigned narrowing saturating shift right by constant +uint8x8_t vqshrun_n_s16(int16x8_t a, __constrange(1,8) int b); // VQSHRUN.S16 d0,q0,#8 +uint16x4_t vqshrun_n_s32(int32x4_t a, __constrange(1,16) int b); // VQSHRUN.S32 d0,q0,#16 +uint32x2_t vqshrun_n_s64(int64x2_t a, __constrange(1,32) int b); // VQSHRUN.S64 d0,q0,#32 +//Vector signed->unsigned rounding narrowing saturating shift right by constant +uint8x8_t vqrshrun_n_s16(int16x8_t a, __constrange(1,8) int b); // VQRSHRUN.S16 d0,q0,#8 +uint16x4_t vqrshrun_n_s32(int32x4_t a, __constrange(1,16) int b); // VQRSHRUN.S32 d0,q0,#16 +uint32x2_t vqrshrun_n_s64(int64x2_t a, __constrange(1,32) int b); // VQRSHRUN.S64 d0,q0,#32 +//Vector narrowing saturating shift right by constant +int8x8_t vqshrn_n_s16(int16x8_t a, __constrange(1,8) int b); // VQSHRN.S16 d0,q0,#8 +int16x4_t vqshrn_n_s32(int32x4_t a, __constrange(1,16) int b); // VQSHRN.S32 d0,q0,#16 +int32x2_t vqshrn_n_s64(int64x2_t a, __constrange(1,32) int b); // VQSHRN.S64 d0,q0,#32 +uint8x8_t vqshrn_n_u16(uint16x8_t a, __constrange(1,8) int b); // VQSHRN.U16 d0,q0,#8 +uint16x4_t vqshrn_n_u32(uint32x4_t a, __constrange(1,16) int b); // VQSHRN.U32 d0,q0,#16 +uint32x2_t vqshrn_n_u64(uint64x2_t a, __constrange(1,32) int b); // VQSHRN.U64 d0,q0,#32 +//Vector rounding narrowing shift right by constant +int8x8_t vrshrn_n_s16(int16x8_t a, __constrange(1,8) int b); // VRSHRN.I16 d0,q0,#8 +int16x4_t vrshrn_n_s32(int32x4_t a, __constrange(1,16) int b); // VRSHRN.I32 d0,q0,#16 +int32x2_t vrshrn_n_s64(int64x2_t a, __constrange(1,32) int b); // VRSHRN.I64 d0,q0,#32 +uint8x8_t vrshrn_n_u16(uint16x8_t a, __constrange(1,8) int b); // VRSHRN.I16 d0,q0,#8 +uint16x4_t vrshrn_n_u32(uint32x4_t a, __constrange(1,16) int b); // VRSHRN.I32 d0,q0,#16 +uint32x2_t vrshrn_n_u64(uint64x2_t a, __constrange(1,32) int b); // VRSHRN.I64 d0,q0,#32 +//Vector rounding narrowing saturating shift right by constant +int8x8_t vqrshrn_n_s16(int16x8_t a, __constrange(1,8) int b); // VQRSHRN.S16 d0,q0,#8 +int16x4_t vqrshrn_n_s32(int32x4_t a, __constrange(1,16) int b); // VQRSHRN.S32 d0,q0,#16 +int32x2_t vqrshrn_n_s64(int64x2_t a, __constrange(1,32) int b); // VQRSHRN.S64 d0,q0,#32 +uint8x8_t vqrshrn_n_u16(uint16x8_t a, __constrange(1,8) int b); // VQRSHRN.U16 d0,q0,#8 +uint16x4_t vqrshrn_n_u32(uint32x4_t a, __constrange(1,16) int b); // VQRSHRN.U32 d0,q0,#16 +uint32x2_t vqrshrn_n_u64(uint64x2_t a, __constrange(1,32) int b); // VQRSHRN.U64 d0,q0,#32 +//Vector widening shift left by constant +int16x8_t vshll_n_s8(int8x8_t a, __constrange(0,8) int b); // VSHLL.S8 q0,d0,#0 +int32x4_t vshll_n_s16(int16x4_t a, __constrange(0,16) int b); // VSHLL.S16 q0,d0,#0 +int64x2_t vshll_n_s32(int32x2_t a, __constrange(0,32) int b); // VSHLL.S32 q0,d0,#0 +uint16x8_t vshll_n_u8(uint8x8_t a, __constrange(0,8) int b); // VSHLL.U8 q0,d0,#0 +uint32x4_t vshll_n_u16(uint16x4_t a, __constrange(0,16) int b); // VSHLL.U16 q0,d0,#0 +uint64x2_t vshll_n_u32(uint32x2_t a, __constrange(0,32) int b); // VSHLL.U32 q0,d0,#0 +//Shifts with insert +//Vector shift right and insert +int8x8_t vsri_n_s8(int8x8_t a, int8x8_t b, __constrange(1,8) int c); // VSRI.8 d0,d0,#8 +int16x4_t vsri_n_s16(int16x4_t a, int16x4_t b, __constrange(1,16) int c); // VSRI.16 d0,d0,#16 +int32x2_t vsri_n_s32(int32x2_t a, int32x2_t b, __constrange(1,32) int c); // VSRI.32 d0,d0,#32 +int64x1_t vsri_n_s64(int64x1_t a, int64x1_t b, __constrange(1,64) int c); // VSRI.64 d0,d0,#64 +uint8x8_t vsri_n_u8(uint8x8_t a, uint8x8_t b, __constrange(1,8) int c); // VSRI.8 d0,d0,#8 +uint16x4_t vsri_n_u16(uint16x4_t a, uint16x4_t b, __constrange(1,16) int c); // VSRI.16 d0,d0,#16 +uint32x2_t vsri_n_u32(uint32x2_t a, uint32x2_t b, __constrange(1,32) int c); // VSRI.32 d0,d0,#32 +uint64x1_t vsri_n_u64(uint64x1_t a, uint64x1_t b, __constrange(1,64) int c); // VSRI.64 d0,d0,#64 +poly8x8_t vsri_n_p8(poly8x8_t a, poly8x8_t b, __constrange(1,8) int c); // VSRI.8 d0,d0,#8 +poly16x4_t vsri_n_p16(poly16x4_t a, poly16x4_t b, __constrange(1,16) int c); // VSRI.16 d0,d0,#16 +int8x16_t vsriq_n_s8(int8x16_t a, int8x16_t b, __constrange(1,8) int c); // VSRI.8 q0,q0,#8 +int16x8_t vsriq_n_s16(int16x8_t a, int16x8_t b, __constrange(1,16) int c); // VSRI.16 q0,q0,#16 +int32x4_t vsriq_n_s32(int32x4_t a, int32x4_t b, __constrange(1,32) int c); // VSRI.32 q0,q0,#32 +int64x2_t vsriq_n_s64(int64x2_t a, int64x2_t b, __constrange(1,64) int c); // VSRI.64 q0,q0,#64 +uint8x16_t vsriq_n_u8(uint8x16_t a, uint8x16_t b, __constrange(1,8) int c); // VSRI.8 q0,q0,#8 +uint16x8_t vsriq_n_u16(uint16x8_t a, uint16x8_t b, __constrange(1,16) int c); // VSRI.16 q0,q0,#16 +uint32x4_t vsriq_n_u32(uint32x4_t a, uint32x4_t b, __constrange(1,32) int c); // VSRI.32 q0,q0,#32 +uint64x2_t vsriq_n_u64(uint64x2_t a, uint64x2_t b, __constrange(1,64) int c); // VSRI.64 q0,q0,#64 +poly8x16_t vsriq_n_p8(poly8x16_t a, poly8x16_t b, __constrange(1,8) int c); // VSRI.8 q0,q0,#8 +poly16x8_t vsriq_n_p16(poly16x8_t a, poly16x8_t b, __constrange(1,16) int c); // VSRI.16 q0,q0,#16 +//Vector shift left and insert +int8x8_t vsli_n_s8(int8x8_t a, int8x8_t b, __constrange(0,7) int c); // VSLI.8 d0,d0,#0 +int16x4_t vsli_n_s16(int16x4_t a, int16x4_t b, __constrange(0,15) int c); // VSLI.16 d0,d0,#0 +int32x2_t vsli_n_s32(int32x2_t a, int32x2_t b, __constrange(0,31) int c); // VSLI.32 d0,d0,#0 +int64x1_t vsli_n_s64(int64x1_t a, int64x1_t b, __constrange(0,63) int c); // VSLI.64 d0,d0,#0 +uint8x8_t vsli_n_u8(uint8x8_t a, uint8x8_t b, __constrange(0,7) int c); // VSLI.8 d0,d0,#0 +uint16x4_t vsli_n_u16(uint16x4_t a, uint16x4_t b, __constrange(0,15) int c); // VSLI.16 d0,d0,#0 +uint32x2_t vsli_n_u32(uint32x2_t a, uint32x2_t b, __constrange(0,31) int c); // VSLI.32 d0,d0,#0 +uint64x1_t vsli_n_u64(uint64x1_t a, uint64x1_t b, __constrange(0,63) int c); // VSLI.64 d0,d0,#0 +poly8x8_t vsli_n_p8(poly8x8_t a, poly8x8_t b, __constrange(0,7) int c); // VSLI.8 d0,d0,#0 +poly16x4_t vsli_n_p16(poly16x4_t a, poly16x4_t b, __constrange(0,15) int c); // VSLI.16 d0,d0,#0 +int8x16_t vsliq_n_s8(int8x16_t a, int8x16_t b, __constrange(0,7) int c); // VSLI.8 q0,q0,#0 +int16x8_t vsliq_n_s16(int16x8_t a, int16x8_t b, __constrange(0,15) int c); // VSLI.16 q0,q0,#0 +int32x4_t vsliq_n_s32(int32x4_t a, int32x4_t b, __constrange(0,31) int c); // VSLI.32 q0,q0,#0 +int64x2_t vsliq_n_s64(int64x2_t a, int64x2_t b, __constrange(0,63) int c); // VSLI.64 q0,q0,#0 +uint8x16_t vsliq_n_u8(uint8x16_t a, uint8x16_t b, __constrange(0,7) int c); // VSLI.8 q0,q0,#0 +uint16x8_t vsliq_n_u16(uint16x8_t a, uint16x8_t b, __constrange(0,15) int c); // VSLI.16 q0,q0,#0 +uint32x4_t vsliq_n_u32(uint32x4_t a, uint32x4_t b, __constrange(0,31) int c); // VSLI.32 q0,q0,#0 +uint64x2_t vsliq_n_u64(uint64x2_t a, uint64x2_t b, __constrange(0,63) int c); // VSLI.64 q0,q0,#0 +poly8x16_t vsliq_n_p8(poly8x16_t a, poly8x16_t b, __constrange(0,7) int c); // VSLI.8 q0,q0,#0 +poly16x8_t vsliq_n_p16(poly16x8_t a, poly16x8_t b, __constrange(0,15) int c); // VSLI.16 q0,q0,#0 +//Loads of a single vector or lane. Perform loads and stores of a single vector of some type. +//Load a single vector from memory +uint8x16_t vld1q_u8(__transfersize(16) uint8_t const * ptr); // VLD1.8 {d0, d1}, [r0] +uint16x8_t vld1q_u16(__transfersize(8) uint16_t const * ptr); // VLD1.16 {d0, d1}, [r0] +uint32x4_t vld1q_u32(__transfersize(4) uint32_t const * ptr); // VLD1.32 {d0, d1}, [r0] +uint64x2_t vld1q_u64(__transfersize(2) uint64_t const * ptr); // VLD1.64 {d0, d1}, [r0] +int8x16_t vld1q_s8(__transfersize(16) int8_t const * ptr); // VLD1.8 {d0, d1}, [r0] +int16x8_t vld1q_s16(__transfersize(8) int16_t const * ptr); // VLD1.16 {d0, d1}, [r0] +int32x4_t vld1q_s32(__transfersize(4) int32_t const * ptr); // VLD1.32 {d0, d1}, [r0] +int64x2_t vld1q_s64(__transfersize(2) int64_t const * ptr); // VLD1.64 {d0, d1}, [r0] +float16x8_t vld1q_f16(__transfersize(8) __fp16 const * ptr); // VLD1.16 {d0, d1}, [r0] +float32x4_t vld1q_f32(__transfersize(4) float32_t const * ptr); // VLD1.32 {d0, d1}, [r0] +poly8x16_t vld1q_p8(__transfersize(16) poly8_t const * ptr); // VLD1.8 {d0, d1}, [r0] +poly16x8_t vld1q_p16(__transfersize(8) poly16_t const * ptr); // VLD1.16 {d0, d1}, [r0] +uint8x8_t vld1_u8(__transfersize(8) uint8_t const * ptr); // VLD1.8 {d0}, [r0] +uint16x4_t vld1_u16(__transfersize(4) uint16_t const * ptr); // VLD1.16 {d0}, [r0] +uint32x2_t vld1_u32(__transfersize(2) uint32_t const * ptr); // VLD1.32 {d0}, [r0] +uint64x1_t vld1_u64(__transfersize(1) uint64_t const * ptr); // VLD1.64 {d0}, [r0] +int8x8_t vld1_s8(__transfersize(8) int8_t const * ptr); // VLD1.8 {d0}, [r0] +int16x4_t vld1_s16(__transfersize(4) int16_t const * ptr); // VLD1.16 {d0}, [r0] +int32x2_t vld1_s32(__transfersize(2) int32_t const * ptr); // VLD1.32 {d0}, [r0] +int64x1_t vld1_s64(__transfersize(1) int64_t const * ptr); // VLD1.64 {d0}, [r0] +float16x4_t vld1_f16(__transfersize(4) __fp16 const * ptr); // VLD1.16 {d0}, [r0] +float32x2_t vld1_f32(__transfersize(2) float32_t const * ptr); // VLD1.32 {d0}, [r0] +poly8x8_t vld1_p8(__transfersize(8) poly8_t const * ptr); // VLD1.8 {d0}, [r0] +poly16x4_t vld1_p16(__transfersize(4) poly16_t const * ptr); // VLD1.16 {d0}, [r0] +//Load a single lane from memory +uint8x16_t vld1q_lane_u8(__transfersize(1) uint8_t const * ptr, uint8x16_t vec, __constrange(0,15) int lane); //VLD1.8 {d0[0]}, [r0] +uint16x8_t vld1q_lane_u16(__transfersize(1) uint16_t const * ptr, uint16x8_t vec, __constrange(0,7) int lane); // VLD1.16 {d0[0]}, [r0] +uint32x4_t vld1q_lane_u32(__transfersize(1) uint32_t const * ptr, uint32x4_t vec, __constrange(0,3) int lane); // VLD1.32 {d0[0]}, [r0] +uint64x2_t vld1q_lane_u64(__transfersize(1) uint64_t const * ptr, uint64x2_t vec, __constrange(0,1) int lane); // VLD1.64 {d0}, [r0] +int8x16_t vld1q_lane_s8(__transfersize(1) int8_t const * ptr, int8x16_t vec, __constrange(0,15) int lane); //VLD1.8 {d0[0]}, [r0] +int16x8_t vld1q_lane_s16(__transfersize(1) int16_t const * ptr, int16x8_t vec, __constrange(0,7) int lane); //VLD1.16 {d0[0]}, [r0] +int32x4_t vld1q_lane_s32(__transfersize(1) int32_t const * ptr, int32x4_t vec, __constrange(0,3) int lane); //VLD1.32 {d0[0]}, [r0] +float16x8_t vld1q_lane_f16(__transfersize(1) __fp16 const * ptr, float16x8_t vec, __constrange(0,7) int lane); //VLD1.16 {d0[0]}, [r0] +float32x4_t vld1q_lane_f32(__transfersize(1) float32_t const * ptr, float32x4_t vec, __constrange(0,3) int lane); // VLD1.32 {d0[0]}, [r0] +int64x2_t vld1q_lane_s64(__transfersize(1) int64_t const * ptr, int64x2_t vec, __constrange(0,1) int lane); //VLD1.64 {d0}, [r0] +poly8x16_t vld1q_lane_p8(__transfersize(1) poly8_t const * ptr, poly8x16_t vec, __constrange(0,15) int lane); //VLD1.8 {d0[0]}, [r0] +poly16x8_t vld1q_lane_p16(__transfersize(1) poly16_t const * ptr, poly16x8_t vec, __constrange(0,7) int lane); // VLD1.16 {d0[0]}, [r0] +uint8x8_t vld1_lane_u8(__transfersize(1) uint8_t const * ptr, uint8x8_t vec, __constrange(0,7) int lane); //VLD1.8 {d0[0]}, [r0] +uint16x4_t vld1_lane_u16(__transfersize(1) uint16_t const * ptr, uint16x4_t vec, __constrange(0,3) int lane); //VLD1.16 {d0[0]}, [r0] +uint32x2_t vld1_lane_u32(__transfersize(1) uint32_t const * ptr, uint32x2_t vec, __constrange(0,1) int lane); //VLD1.32 {d0[0]}, [r0] +uint64x1_t vld1_lane_u64(__transfersize(1) uint64_t const * ptr, uint64x1_t vec, __constrange(0,0) int lane); //VLD1.64 {d0}, [r0] +int8x8_t vld1_lane_s8(__transfersize(1) int8_t const * ptr, int8x8_t vec, __constrange(0,7) int lane); // VLD1.8{d0[0]}, [r0] +int16x4_t vld1_lane_s16(__transfersize(1) int16_t const * ptr, int16x4_t vec, __constrange(0,3) int lane); //VLD1.16 {d0[0]}, [r0] +int32x2_t vld1_lane_s32(__transfersize(1) int32_t const * ptr, int32x2_t vec, __constrange(0,1) int lane); //VLD1.32 {d0[0]}, [r0] +float16x4_t vld1q_lane_f16(__transfersize(1) __fp16 const * ptr, float16x4_t vec, __constrange(0,3) int lane); //VLD1.16 {d0[0]}, [r0] +float32x2_t vld1_lane_f32(__transfersize(1) float32_t const * ptr, float32x2_t vec, __constrange(0,1) int lane); // VLD1.32 {d0[0]}, [r0] +int64x1_t vld1_lane_s64(__transfersize(1) int64_t const * ptr, int64x1_t vec, __constrange(0,0) int lane); //VLD1.64 {d0}, [r0] +poly8x8_t vld1_lane_p8(__transfersize(1) poly8_t const * ptr, poly8x8_t vec, __constrange(0,7) int lane); //VLD1.8 {d0[0]}, [r0] +poly16x4_t vld1_lane_p16(__transfersize(1) poly16_t const * ptr, poly16x4_t vec, __constrange(0,3) int lane); //VLD1.16 {d0[0]}, [r0] +//Load all lanes of vector with same value from memory +uint8x16_t vld1q_dup_u8(__transfersize(1) uint8_t const * ptr); // VLD1.8 {d0[]}, [r0] +uint16x8_t vld1q_dup_u16(__transfersize(1) uint16_t const * ptr); // VLD1.16 {d0[]}, [r0] +uint32x4_t vld1q_dup_u32(__transfersize(1) uint32_t const * ptr); // VLD1.32 {d0[]}, [r0] +uint64x2_t vld1q_dup_u64(__transfersize(1) uint64_t const * ptr); // VLD1.64 {d0}, [r0] +int8x16_t vld1q_dup_s8(__transfersize(1) int8_t const * ptr); // VLD1.8 {d0[]}, [r0] +int16x8_t vld1q_dup_s16(__transfersize(1) int16_t const * ptr); // VLD1.16 {d0[]}, [r0] +int32x4_t vld1q_dup_s32(__transfersize(1) int32_t const * ptr); // VLD1.32 {d0[]}, [r0] +int64x2_t vld1q_dup_s64(__transfersize(1) int64_t const * ptr); // VLD1.64 {d0}, [r0] +float16x8_t vld1q_dup_f16(__transfersize(1) __fp16 const * ptr); // VLD1.16 {d0[]}, [r0] +float32x4_t vld1q_dup_f32(__transfersize(1) float32_t const * ptr); // VLD1.32 {d0[]}, [r0] +poly8x16_t vld1q_dup_p8(__transfersize(1) poly8_t const * ptr); // VLD1.8 {d0[]}, [r0] +poly16x8_t vld1q_dup_p16(__transfersize(1) poly16_t const * ptr); // VLD1.16 {d0[]}, [r0] +uint8x8_t vld1_dup_u8(__transfersize(1) uint8_t const * ptr); // VLD1.8 {d0[]}, [r0] +uint16x4_t vld1_dup_u16(__transfersize(1) uint16_t const * ptr); // VLD1.16 {d0[]}, [r0] +uint32x2_t vld1_dup_u32(__transfersize(1) uint32_t const * ptr); // VLD1.32 {d0[]}, [r0] +uint64x1_t vld1_dup_u64(__transfersize(1) uint64_t const * ptr); // VLD1.64 {d0}, [r0] +int8x8_t vld1_dup_s8(__transfersize(1) int8_t const * ptr); // VLD1.8 {d0[]}, [r0] +int16x4_t vld1_dup_s16(__transfersize(1) int16_t const * ptr); // VLD1.16 {d0[]}, [r0] +int32x2_t vld1_dup_s32(__transfersize(1) int32_t const * ptr); // VLD1.32 {d0[]}, [r0] +int64x1_t vld1_dup_s64(__transfersize(1) int64_t const * ptr); // VLD1.64 {d0}, [r0] +float16x4_t vld1_dup_f16(__transfersize(1) __fp16 const * ptr); // VLD1.16 {d0[]}, [r0] +float32x2_t vld1_dup_f32(__transfersize(1) float32_t const * ptr); // VLD1.32 {d0[]}, [r0] +poly8x8_t vld1_dup_p8(__transfersize(1) poly8_t const * ptr); // VLD1.8 {d0[]}, [r0] +poly16x4_t vld1_dup_p16(__transfersize(1) poly16_t const * ptr); // VLD1.16 {d0[]}, [r0] +//Store a single vector or lane. Stores all lanes or a single lane of a vector. +//Store a single vector into memory +void vst1q_u8(__transfersize(16) uint8_t * ptr, uint8x16_t val); // VST1.8 {d0, d1}, [r0] +void vst1q_u16(__transfersize(8) uint16_t * ptr, uint16x8_t val); // VST1.16 {d0, d1}, [r0] +void vst1q_u32(__transfersize(4) uint32_t * ptr, uint32x4_t val); // VST1.32 {d0, d1}, [r0] +void vst1q_u64(__transfersize(2) uint64_t * ptr, uint64x2_t val); // VST1.64 {d0, d1}, [r0] +void vst1q_s8(__transfersize(16) int8_t * ptr, int8x16_t val); // VST1.8 {d0, d1}, [r0] +void vst1q_s16(__transfersize(8) int16_t * ptr, int16x8_t val); // VST1.16 {d0, d1}, [r0] +void vst1q_s32(__transfersize(4) int32_t * ptr, int32x4_t val); // VST1.32 {d0, d1}, [r0] +void vst1q_s64(__transfersize(2) int64_t * ptr, int64x2_t val); // VST1.64 {d0, d1}, [r0] +void vst1q_f16(__transfersize(8) __fp16 * ptr, float16x8_t val); // VST1.16 {d0, d1}, [r0] +void vst1q_f32(__transfersize(4) float32_t * ptr, float32x4_t val); // VST1.32 {d0, d1}, [r0] +void vst1q_p8(__transfersize(16) poly8_t * ptr, poly8x16_t val); // VST1.8 {d0, d1}, [r0] +void vst1q_p16(__transfersize(8) poly16_t * ptr, poly16x8_t val); // VST1.16 {d0, d1}, [r0] +void vst1_u8(__transfersize(8) uint8_t * ptr, uint8x8_t val); // VST1.8 {d0}, [r0] +void vst1_u16(__transfersize(4) uint16_t * ptr, uint16x4_t val); // VST1.16 {d0}, [r0] +void vst1_u32(__transfersize(2) uint32_t * ptr, uint32x2_t val); // VST1.32 {d0}, [r0] +void vst1_u64(__transfersize(1) uint64_t * ptr, uint64x1_t val); // VST1.64 {d0}, [r0] +void vst1_s8(__transfersize(8) int8_t * ptr, int8x8_t val); // VST1.8 {d0}, [r0] +void vst1_s16(__transfersize(4) int16_t * ptr, int16x4_t val); // VST1.16 {d0}, [r0] +void vst1_s32(__transfersize(2) int32_t * ptr, int32x2_t val); // VST1.32 {d0}, [r0] +void vst1_s64(__transfersize(1) int64_t * ptr, int64x1_t val); // VST1.64 {d0}, [r0] +void vst1_f16(__transfersize(4) __fp16 * ptr, float16x4_t val); // VST1.16 {d0}, [r0] +void vst1_f32(__transfersize(2) float32_t * ptr, float32x2_t val); // VST1.32 {d0}, [r0] +void vst1_p8(__transfersize(8) poly8_t * ptr, poly8x8_t val); // VST1.8 {d0}, [r0] +void vst1_p16(__transfersize(4) poly16_t * ptr, poly16x4_t val); // VST1.16 {d0}, [r0] +//Store a lane of a vector into memory +//Loads of an N-element structure +//Load N-element structure from memory +uint8x16x2_t vld2q_u8(__transfersize(32) uint8_t const * ptr); // VLD2.8 {d0, d2}, [r0] +uint16x8x2_t vld2q_u16(__transfersize(16) uint16_t const * ptr); // VLD2.16 {d0, d2}, [r0] +uint32x4x2_t vld2q_u32(__transfersize(8) uint32_t const * ptr); // VLD2.32 {d0, d2}, [r0] +int8x16x2_t vld2q_s8(__transfersize(32) int8_t const * ptr); // VLD2.8 {d0, d2}, [r0] +int16x8x2_t vld2q_s16(__transfersize(16) int16_t const * ptr); // VLD2.16 {d0, d2}, [r0] +int32x4x2_t vld2q_s32(__transfersize(8) int32_t const * ptr); // VLD2.32 {d0, d2}, [r0] +float16x8x2_t vld2q_f16(__transfersize(16) __fp16 const * ptr); // VLD2.16 {d0, d2}, [r0] +float32x4x2_t vld2q_f32(__transfersize(8) float32_t const * ptr); // VLD2.32 {d0, d2}, [r0] +poly8x16x2_t vld2q_p8(__transfersize(32) poly8_t const * ptr); // VLD2.8 {d0, d2}, [r0] +poly16x8x2_t vld2q_p16(__transfersize(16) poly16_t const * ptr); // VLD2.16 {d0, d2}, [r0] +uint8x8x2_t vld2_u8(__transfersize(16) uint8_t const * ptr); // VLD2.8 {d0, d1}, [r0] +uint16x4x2_t vld2_u16(__transfersize(8) uint16_t const * ptr); // VLD2.16 {d0, d1}, [r0] +uint32x2x2_t vld2_u32(__transfersize(4) uint32_t const * ptr); // VLD2.32 {d0, d1}, [r0] +uint64x1x2_t vld2_u64(__transfersize(2) uint64_t const * ptr); // VLD1.64 {d0, d1}, [r0] +int8x8x2_t vld2_s8(__transfersize(16) int8_t const * ptr); // VLD2.8 {d0, d1}, [r0] +int16x4x2_t vld2_s16(__transfersize(8) int16_t const * ptr); // VLD2.16 {d0, d1}, [r0] +int32x2x2_t vld2_s32(__transfersize(4) int32_t const * ptr); // VLD2.32 {d0, d1}, [r0] +int64x1x2_t vld2_s64(__transfersize(2) int64_t const * ptr); // VLD1.64 {d0, d1}, [r0] +//float16x4x2_t vld2_f16(__transfersize(8) __fp16 const * ptr); // VLD2.16 {d0, d1}, [r0] +float32x2x2_t vld2_f32(__transfersize(4) float32_t const * ptr); // VLD2.32 {d0, d1}, [r0] +poly8x8x2_t vld2_p8(__transfersize(16) poly8_t const * ptr); // VLD2.8 {d0, d1}, [r0] +poly16x4x2_t vld2_p16(__transfersize(8) poly16_t const * ptr); // VLD2.16 {d0, d1}, [r0] +uint8x16x3_t vld3q_u8(__transfersize(48) uint8_t const * ptr); // VLD3.8 {d0, d2, d4}, [r0] +uint16x8x3_t vld3q_u16(__transfersize(24) uint16_t const * ptr); // VLD3.16 {d0, d2, d4}, [r0] +uint32x4x3_t vld3q_u32(__transfersize(12) uint32_t const * ptr); // VLD3.32 {d0, d2, d4}, [r0] +int8x16x3_t vld3q_s8(__transfersize(48) int8_t const * ptr); // VLD3.8 {d0, d2, d4}, [r0] +int16x8x3_t vld3q_s16(__transfersize(24) int16_t const * ptr); // VLD3.16 {d0, d2, d4}, [r0] +int32x4x3_t vld3q_s32(__transfersize(12) int32_t const * ptr); // VLD3.32 {d0, d2, d4}, [r0] +float16x8x3_t vld3q_f16(__transfersize(24) __fp16 const * ptr); // VLD3.16 {d0, d2, d4}, [r0] +float32x4x3_t vld3q_f32(__transfersize(12) float32_t const * ptr); // VLD3.32 {d0, d2, d4}, [r0] +poly8x16x3_t vld3q_p8(__transfersize(48) poly8_t const * ptr); // VLD3.8 {d0, d2, d4}, [r0] +poly16x8x3_t vld3q_p16(__transfersize(24) poly16_t const * ptr); // VLD3.16 {d0, d2, d4}, [r0] +uint8x8x3_t vld3_u8(__transfersize(24) uint8_t const * ptr); // VLD3.8 {d0, d1, d2}, [r0] +uint16x4x3_t vld3_u16(__transfersize(12) uint16_t const * ptr); // VLD3.16 {d0, d1, d2}, [r0] +uint32x2x3_t vld3_u32(__transfersize(6) uint32_t const * ptr); // VLD3.32 {d0, d1, d2}, [r0] +uint64x1x3_t vld3_u64(__transfersize(3) uint64_t const * ptr); // VLD1.64 {d0, d1, d2}, [r0] +int8x8x3_t vld3_s8(__transfersize(24) int8_t const * ptr); // VLD3.8 {d0, d1, d2}, [r0] +int16x4x3_t vld3_s16(__transfersize(12) int16_t const * ptr); // VLD3.16 {d0, d1, d2}, [r0] +int32x2x3_t vld3_s32(__transfersize(6) int32_t const * ptr); // VLD3.32 {d0, d1, d2}, [r0] +int64x1x3_t vld3_s64(__transfersize(3) int64_t const * ptr); // VLD1.64 {d0, d1, d2}, [r0] +float16x4x3_t vld3_f16(__transfersize(12) __fp16 const * ptr); // VLD3.16 {d0, d1, d2}, [r0] +float32x2x3_t vld3_f32(__transfersize(6) float32_t const * ptr); // VLD3.32 {d0, d1, d2}, [r0] +poly8x8x3_t vld3_p8(__transfersize(24) poly8_t const * ptr); // VLD3.8 {d0, d1, d2}, [r0] +poly16x4x3_t vld3_p16(__transfersize(12) poly16_t const * ptr); // VLD3.16 {d0, d1, d2}, [r0] +uint8x16x4_t vld4q_u8(__transfersize(64) uint8_t const * ptr); // VLD4.8 {d0, d2, d4, d6}, [r0] +uint16x8x4_t vld4q_u16(__transfersize(32) uint16_t const * ptr); // VLD4.16 {d0, d2, d4, d6}, [r0] +uint32x4x4_t vld4q_u32(__transfersize(16) uint32_t const * ptr); // VLD4.32 {d0, d2, d4, d6}, [r0] +int8x16x4_t vld4q_s8(__transfersize(64) int8_t const * ptr); // VLD4.8 {d0, d2, d4, d6}, [r0] +int16x8x4_t vld4q_s16(__transfersize(32) int16_t const * ptr); // VLD4.16 {d0, d2, d4, d6}, [r0] +int32x4x4_t vld4q_s32(__transfersize(16) int32_t const * ptr); // VLD4.32 {d0, d2, d4, d6}, [r0] +float16x8x4_t vld4q_f16(__transfersize(32) __fp16 const * ptr); // VLD4.16 {d0, d2, d4, d6}, [r0] +float32x4x4_t vld4q_f32(__transfersize(16) float32_t const * ptr); // VLD4.32 {d0, d2, d4, d6}, [r0] +poly8x16x4_t vld4q_p8(__transfersize(64) poly8_t const * ptr); // VLD4.8 {d0, d2, d4, d6}, [r0] +poly16x8x4_t vld4q_p16(__transfersize(32) poly16_t const * ptr); // VLD4.16 {d0, d2, d4, d6}, [r0] +uint8x8x4_t vld4_u8(__transfersize(32) uint8_t const * ptr); // VLD4.8 {d0, d1, d2, d3}, [r0] +uint16x4x4_t vld4_u16(__transfersize(16) uint16_t const * ptr); // VLD4.16 {d0, d1, d2, d3}, [r0] +uint32x2x4_t vld4_u32(__transfersize(8) uint32_t const * ptr); // VLD4.32 {d0, d1, d2, d3}, [r0] +uint64x1x4_t vld4_u64(__transfersize(4) uint64_t const * ptr); // VLD1.64 {d0, d1, d2, d3}, [r0] +int8x8x4_t vld4_s8(__transfersize(32) int8_t const * ptr); // VLD4.8 {d0, d1, d2, d3}, [r0] +int16x4x4_t vld4_s16(__transfersize(16) int16_t const * ptr); // VLD4.16 {d0, d1, d2, d3}, [r0] +int32x2x4_t vld4_s32(__transfersize(8) int32_t const * ptr); // VLD4.32 {d0, d1, d2, d3}, [r0] +int64x1x4_t vld4_s64(__transfersize(4) int64_t const * ptr); // VLD1.64 {d0, d1, d2, d3}, [r0] +float16x4x4_t vld4_f16(__transfersize(16) __fp16 const * ptr); // VLD4.16 {d0, d1, d2, d3}, [r0] +float32x2x4_t vld4_f32(__transfersize(8) float32_t const * ptr); // VLD4.32 {d0, d1, d2, d3}, [r0] +poly8x8x4_t vld4_p8(__transfersize(32) poly8_t const * ptr); // VLD4.8 {d0, d1, d2, d3}, [r0] +poly16x4x4_t vld4_p16(__transfersize(16) poly16_t const * ptr); // VLD4.16 {d0, d1, d2, d3}, [r0] +//Load all lanes of N-element structure with same value from memory +uint8x8x2_t vld2_dup_u8(__transfersize(2) uint8_t const * ptr); // VLD2.8 {d0[], d1[]}, [r0] +uint16x4x2_t vld2_dup_u16(__transfersize(2) uint16_t const * ptr); // VLD2.16 {d0[], d1[]}, [r0] +uint32x2x2_t vld2_dup_u32(__transfersize(2) uint32_t const * ptr); // VLD2.32 {d0[], d1[]}, [r0] +uint64x1x2_t vld2_dup_u64(__transfersize(2) uint64_t const * ptr); // VLD1.64 {d0, d1}, [r0] +int8x8x2_t vld2_dup_s8(__transfersize(2) int8_t const * ptr); // VLD2.8 {d0[], d1[]}, [r0] +int16x4x2_t vld2_dup_s16(__transfersize(2) int16_t const * ptr); // VLD2.16 {d0[], d1[]}, [r0] +int32x2x2_t vld2_dup_s32(__transfersize(2) int32_t const * ptr); // VLD2.32 {d0[], d1[]}, [r0] +int64x1x2_t vld2_dup_s64(__transfersize(2) int64_t const * ptr); // VLD1.64 {d0, d1}, [r0] +//float16x4x2_t vld2_dup_f16(__transfersize(2) __fp16 const * ptr); // VLD2.16 {d0[], d1[]}, [r0] +float32x2x2_t vld2_dup_f32(__transfersize(2) float32_t const * ptr); // VLD2.32 {d0[], d1[]}, [r0] +poly8x8x2_t vld2_dup_p8(__transfersize(2) poly8_t const * ptr); // VLD2.8 {d0[], d1[]}, [r0] +poly16x4x2_t vld2_dup_p16(__transfersize(2) poly16_t const * ptr); // VLD2.16 {d0[], d1[]}, [r0] +uint8x8x3_t vld3_dup_u8(__transfersize(3) uint8_t const * ptr); // VLD3.8 {d0[], d1[], d2[]}, [r0] +uint16x4x3_t vld3_dup_u16(__transfersize(3) uint16_t const * ptr); // VLD3.16 {d0[], d1[], d2[]}, [r0] +uint32x2x3_t vld3_dup_u32(__transfersize(3) uint32_t const * ptr); // VLD3.32 {d0[], d1[], d2[]}, [r0] +uint64x1x3_t vld3_dup_u64(__transfersize(3) uint64_t const * ptr); // VLD1.64 {d0, d1, d2}, [r0] +int8x8x3_t vld3_dup_s8(__transfersize(3) int8_t const * ptr); // VLD3.8 {d0[], d1[], d2[]}, [r0] +int16x4x3_t vld3_dup_s16(__transfersize(3) int16_t const * ptr); // VLD3.16 {d0[], d1[], d2[]}, [r0] +int32x2x3_t vld3_dup_s32(__transfersize(3) int32_t const * ptr); // VLD3.32 {d0[], d1[], d2[]}, [r0] +int64x1x3_t vld3_dup_s64(__transfersize(3) int64_t const * ptr); // VLD1.64 {d0, d1, d2}, [r0] +float16x4x3_t vld3_dup_f16(__transfersize(3) __fp16 const * ptr); // VLD3.16 {d0[], d1[], d2[]}, [r0] +float32x2x3_t vld3_dup_f32(__transfersize(3) float32_t const * ptr); // VLD3.32 {d0[], d1[], d2[]}, [r0] +poly8x8x3_t vld3_dup_p8(__transfersize(3) poly8_t const * ptr); // VLD3.8 {d0[], d1[], d2[]}, [r0] +poly16x4x3_t vld3_dup_p16(__transfersize(3) poly16_t const * ptr); // VLD3.16 {d0[], d1[], d2[]}, [r0] +uint8x8x4_t vld4_dup_u8(__transfersize(4) uint8_t const * ptr); // VLD4.8 {d0[], d1[], d2[], d3[]}, [r0] +uint16x4x4_t vld4_dup_u16(__transfersize(4) uint16_t const * ptr); // VLD4.16 {d0[], d1[], d2[], d3[]}, [r0] +uint32x2x4_t vld4_dup_u32(__transfersize(4) uint32_t const * ptr); // VLD4.32 {d0[], d1[], d2[], d3[]}, [r0] +uint64x1x4_t vld4_dup_u64(__transfersize(4) uint64_t const * ptr); // VLD1.64 {d0, d1, d2, d3}, [r0] +int8x8x4_t vld4_dup_s8(__transfersize(4) int8_t const * ptr); // VLD4.8 {d0[], d1[], d2[], d3[]}, [r0] +int16x4x4_t vld4_dup_s16(__transfersize(4) int16_t const * ptr); // VLD4.16 {d0[], d1[], d2[], d3[]}, [r0] +int32x2x4_t vld4_dup_s32(__transfersize(4) int32_t const * ptr); // VLD4.32 {d0[], d1[], d2[], d3[]}, [r0] +int64x1x4_t vld4_dup_s64(__transfersize(4) int64_t const * ptr); // VLD1.64 {d0, d1, d2, d3}, [r0] +float16x4x4_t vld4_dup_f16(__transfersize(4) __fp16 const * ptr); // VLD4.16 {d0[], d1[], d2[], d3[]}, [r0] +float32x2x4_t vld4_dup_f32(__transfersize(4) float32_t const * ptr); // VLD4.32 {d0[], d1[], d2[], d3[]}, [r0] +poly8x8x4_t vld4_dup_p8(__transfersize(4) poly8_t const * ptr); // VLD4.8 {d0[], d1[], d2[], d3[]}, [r0] +poly16x4x4_t vld4_dup_p16(__transfersize(4) poly16_t const * ptr); // VLD4.16 {d0[], d1[], d2[], d3[]}, [r0] +//Load a single lane of N-element structure from memory +//the functions below are modified to deal with the error C2719: 'src': formal parameter with __declspec(align('16')) won't be aligned +uint16x8x2_t vld2q_lane_u16_ptr(__transfersize(2) uint16_t const * ptr, uint16x8x2_t * src, __constrange(0,7) int lane); // VLD2.16 {d0[0], d2[0]}, [r0] +uint32x4x2_t vld2q_lane_u32_ptr(__transfersize(2) uint32_t const * ptr, uint32x4x2_t * src, __constrange(0,3) int lane); // VLD2.32 {d0[0], d2[0]}, [r0] +int16x8x2_t vld2q_lane_s16_ptr(__transfersize(2) int16_t const * ptr, int16x8x2_t * src, __constrange(0,7) int lane); // VLD2.16 {d0[0], d2[0]}, [r0] +int32x4x2_t vld2q_lane_s32_ptr(__transfersize(2) int32_t const * ptr, int32x4x2_t * src, __constrange(0,3) int lane); // VLD2.32 {d0[0], d2[0]}, [r0] +float16x8x2_t vld2q_lane_f16_ptr(__transfersize(2) __fp16 const * ptr, float16x8x2_t * src, __constrange(0,7) int lane); // VLD2.16 {d0[0], d2[0]}, [r0] +float32x4x2_t vld2q_lane_f32_ptr(__transfersize(2) float32_t const * ptr, float32x4x2_t * src, __constrange(0,3) int lane); // VLD2.32 {d0[0], d2[0]}, [r0] +poly16x8x2_t vld2q_lane_p16_ptr(__transfersize(2) poly16_t const * ptr, poly16x8x2_t * src, __constrange(0,7) int lane); // VLD2.16 {d0[0], d2[0]}, [r0] +uint8x8x2_t vld2_lane_u8_ptr(__transfersize(2) uint8_t const * ptr, uint8x8x2_t * src, __constrange(0,7) int lane); //VLD2.8 {d0[0], d1[0]}, [r0] +uint16x4x2_t vld2_lane_u16_ptr(__transfersize(2) uint16_t const * ptr, uint16x4x2_t * src, __constrange(0,3) int lane); // VLD2.16 {d0[0], d1[0]}, [r0] +uint32x2x2_t vld2_lane_u32_ptr(__transfersize(2) uint32_t const * ptr, uint32x2x2_t * src, __constrange(0,1) int lane); // VLD2.32 {d0[0], d1[0]}, [r0] +int8x8x2_t vld2_lane_s8_ptr(__transfersize(2) int8_t const * ptr, int8x8x2_t * src, __constrange(0,7) int lane); //VLD2.8 {d0[0], d1[0]}, [r0] +int16x4x2_t vld2_lane_s16_ptr(__transfersize(2) int16_t const * ptr, int16x4x2_t * src, __constrange(0,3) int lane); //VLD2.16 {d0[0], d1[0]}, [r0] +int32x2x2_t vld2_lane_s32_ptr(__transfersize(2) int32_t const * ptr, int32x2x2_t * src, __constrange(0,1) int lane); //VLD2.32 {d0[0], d1[0]}, [r0] +//float16x4x2_t vld2_lane_f16_ptr(__transfersize(2) __fp16 const * ptr, float16x4x2_t * src, __constrange(0,3) int lane); // VLD2.16 {d0[0], d1[0]}, [r0] +float32x2x2_t vld2_lane_f32_ptr(__transfersize(2) float32_t const * ptr, float32x2x2_t * src, __constrange(0,1) int lane); // VLD2.32 {d0[0], d1[0]}, [r0] +poly8x8x2_t vld2_lane_p8_ptr(__transfersize(2) poly8_t const * ptr, poly8x8x2_t * src, __constrange(0,7) int lane); //VLD2.8 {d0[0], d1[0]}, [r0] +poly16x4x2_t vld2_lane_p16_ptr(__transfersize(2) poly16_t const * ptr, poly16x4x2_t * src, __constrange(0,3) int lane); // VLD2.16 {d0[0], d1[0]}, [r0] +uint16x8x3_t vld3q_lane_u16_ptr(__transfersize(3) uint16_t const * ptr, uint16x8x3_t * src, __constrange(0,7) int lane); // VLD3.16 {d0[0], d2[0], d4[0]}, [r0] +uint32x4x3_t vld3q_lane_u32_ptr(__transfersize(3) uint32_t const * ptr, uint32x4x3_t * src, __constrange(0,3) int lane); // VLD3.32 {d0[0], d2[0], d4[0]}, [r0] +int16x8x3_t vld3q_lane_s16_ptr(__transfersize(3) int16_t const * ptr, int16x8x3_t * src, __constrange(0,7) int lane); // VLD3.16 {d0[0], d2[0], d4[0]}, [r0] +int32x4x3_t vld3q_lane_s32_ptr(__transfersize(3) int32_t const * ptr, int32x4x3_t * src, __constrange(0,3) int lane); // VLD3.32 {d0[0], d2[0], d4[0]}, [r0] +float16x8x3_t vld3q_lane_f16_ptr(__transfersize(3) __fp16 const * ptr, float16x8x3_t * src, __constrange(0,7) int lane); // VLD3.16 {d0[0], d2[0], d4[0]}, [r0] +float32x4x3_t vld3q_lane_f32_ptr(__transfersize(3) float32_t const * ptr, float32x4x3_t * src, __constrange(0,3) int lane); // VLD3.32 {d0[0], d2[0], d4[0]}, [r0] +poly16x8x3_t vld3q_lane_p16_ptr(__transfersize(3) poly16_t const * ptr, poly16x8x3_t * src, __constrange(0,7) int lane); // VLD3.16 {d0[0], d2[0], d4[0]}, [r0] +uint8x8x3_t vld3_lane_u8_ptr(__transfersize(3) uint8_t const * ptr, uint8x8x3_t * src, __constrange(0,7) int lane); //VLD3.8 {d0[0], d1[0], d2[0]}, [r0] +uint16x4x3_t vld3_lane_u16_ptr(__transfersize(3) uint16_t const * ptr, uint16x4x3_t * src, __constrange(0,3) int lane); // VLD3.16 {d0[0], d1[0], d2[0]}, [r0] +uint32x2x3_t vld3_lane_u32_ptr(__transfersize(3) uint32_t const * ptr, uint32x2x3_t * src, __constrange(0,1) int lane); // VLD3.32 {d0[0], d1[0], d2[0]}, [r0] +int8x8x3_t vld3_lane_s8_ptr(__transfersize(3) int8_t const * ptr, int8x8x3_t * src, __constrange(0,7) int lane); //VLD3.8 {d0[0], d1[0], d2[0]}, [r0] +int16x4x3_t vld3_lane_s16_ptr(__transfersize(3) int16_t const * ptr, int16x4x3_t * src, __constrange(0,3) int lane); //VLD3.16 {d0[0], d1[0], d2[0]}, [r0] +int32x2x3_t vld3_lane_s32_ptr(__transfersize(3) int32_t const * ptr, int32x2x3_t * src, __constrange(0,1) int lane); //VLD3.32 {d0[0], d1[0], d2[0]}, [r0] +float16x4x3_t vld3_lane_f16_ptr(__transfersize(3) __fp16 const * ptr, float16x4x3_t * src, __constrange(0,3) int lane); // VLD3.16 {d0[0], d1[0], d2[0]}, [r0] +float32x2x3_t vld3_lane_f32_ptr(__transfersize(3) float32_t const * ptr, float32x2x3_t * src, __constrange(0,1) int lane); // VLD3.32 {d0[0], d1[0], d2[0]}, [r0] +poly8x8x3_t vld3_lane_p8_ptr(__transfersize(3) poly8_t const * ptr, poly8x8x3_t * src, __constrange(0,7) int lane); //VLD3.8 {d0[0], d1[0], d2[0]}, [r0] +poly16x4x3_t vld3_lane_p16_ptr(__transfersize(3) poly16_t const * ptr, poly16x4x3_t * src, __constrange(0,3) int lane); // VLD3.16 {d0[0], d1[0], d2[0]}, [r0] +uint16x8x4_t vld4q_lane_u16_ptr(__transfersize(4) uint16_t const * ptr, uint16x8x4_t * src, __constrange(0,7) int lane); // VLD4.16 {d0[0], d2[0], d4[0], d6[0]}, [r0] +uint32x4x4_t vld4q_lane_u32_ptr(__transfersize(4) uint32_t const * ptr, uint32x4x4_t * src, __constrange(0,3) int lane); // VLD4.32 {d0[0], d2[0], d4[0], d6[0]}, [r0] +int16x8x4_t vld4q_lane_s16_ptr(__transfersize(4) int16_t const * ptr, int16x8x4_t * src, __constrange(0,7) int lane); // VLD4.16 {d0[0], d2[0], d4[0], d6[0]}, [r0] +int32x4x4_t vld4q_lane_s32_ptr(__transfersize(4) int32_t const * ptr, int32x4x4_t * src, __constrange(0,3) int lane); // VLD4.32 {d0[0], d2[0], d4[0], d6[0]}, [r0] +float16x8x4_t vld4q_lane_f16_ptr(__transfersize(4) __fp16 const * ptr, float16x8x4_t * src, __constrange(0,7) int lane); // VLD4.16 {d0[0], d2[0], d4[0], d6[0]}, [r0] +float32x4x4_t vld4q_lane_f32_ptr(__transfersize(4) float32_t const * ptr, float32x4x4_t * src, __constrange(0,3) int lane); // VLD4.32 {d0[0], d2[0], d4[0], d6[0]}, [r0] +poly16x8x4_t vld4q_lane_p16_ptr(__transfersize(4) poly16_t const * ptr, poly16x8x4_t * src, __constrange(0,7) int lane); // VLD4.16 {d0[0], d2[0], d4[0], d6[0]}, [r0] +uint8x8x4_t vld4_lane_u8_ptr(__transfersize(4) uint8_t const * ptr, uint8x8x4_t * src, __constrange(0,7) int lane); //VLD4.8 {d0[0], d1[0], d2[0], d3[0]}, [r0] +uint16x4x4_t vld4_lane_u16_ptr(__transfersize(4) uint16_t const * ptr, uint16x4x4_t * src, __constrange(0,3) int lane); // VLD4.16 {d0[0], d1[0], d2[0], d3[0]}, [r0] +uint32x2x4_t vld4_lane_u32_ptr(__transfersize(4) uint32_t const * ptr, uint32x2x4_t * src, __constrange(0,1) int lane); // VLD4.32 {d0[0], d1[0], d2[0], d3[0]}, [r0] +int8x8x4_t vld4_lane_s8_ptr(__transfersize(4) int8_t const * ptr, int8x8x4_t * src, __constrange(0,7) int lane); //VLD4.8 {d0[0], d1[0], d2[0], d3[0]}, [r0] +int16x4x4_t vld4_lane_s16_ptr(__transfersize(4) int16_t const * ptr, int16x4x4_t * src, __constrange(0,3) int lane); //VLD4.16 {d0[0], d1[0], d2[0], d3[0]}, [r0] +int32x2x4_t vld4_lane_s32_ptr(__transfersize(4) int32_t const * ptr, int32x2x4_t * src, __constrange(0,1) int lane); //VLD4.32 {d0[0], d1[0], d2[0], d3[0]}, [r0] +float16x4x4_t vld4_lane_f16_ptr(__transfersize(4) __fp16 const * ptr, float16x4x4_t * src, __constrange(0,3) int lane); // VLD4.16 {d0[0], d1[0], d2[0], d3[0]}, [r0] +float32x2x4_t vld4_lane_f32_ptr(__transfersize(4) float32_t const * ptr, float32x2x4_t * src, __constrange(0,1) int lane); // VLD4.32 {d0[0], d1[0], d2[0], d3[0]}, [r0] +poly8x8x4_t vld4_lane_p8_ptr(__transfersize(4) poly8_t const * ptr, poly8x8x4_t * src, __constrange(0,7) int lane); //VLD4.8 {d0[0], d1[0], d2[0], d3[0]}, [r0] +poly16x4x4_t vld4_lane_p16_ptr(__transfersize(4) poly16_t const * ptr, poly16x4x4_t * src, __constrange(0,3) int lane); // VLD4.16 {d0[0], d1[0], d2[0], d3[0]}, [r0] +//Store N-element structure to memory +void vst2q_u8_ptr(__transfersize(32) uint8_t * ptr, uint8x16x2_t * val); // VST2.8 {d0, d2}, [r0] +void vst2q_u16_ptr(__transfersize(16) uint16_t * ptr, uint16x8x2_t * val); // VST2.16 {d0, d2}, [r0] +void vst2q_u32_ptr(__transfersize(8) uint32_t * ptr, uint32x4x2_t * val); // VST2.32 {d0, d2}, [r0] +void vst2q_s8_ptr(__transfersize(32) int8_t * ptr, int8x16x2_t * val); // VST2.8 {d0, d2}, [r0] +void vst2q_s16_ptr(__transfersize(16) int16_t * ptr, int16x8x2_t * val); // VST2.16 {d0, d2}, [r0] +void vst2q_s32_ptr(__transfersize(8) int32_t * ptr, int32x4x2_t * val); // VST2.32 {d0, d2}, [r0] +void vst2q_f16_ptr(__transfersize(16) __fp16 * ptr, float16x8x2_t * val); // VST2.16 {d0, d2}, [r0] +void vst2q_f32_ptr(__transfersize(8) float32_t * ptr, float32x4x2_t * val); // VST2.32 {d0, d2}, [r0] +void vst2q_p8_ptr(__transfersize(32) poly8_t * ptr, poly8x16x2_t * val); // VST2.8 {d0, d2}, [r0] +void vst2q_p16_ptr(__transfersize(16) poly16_t * ptr, poly16x8x2_t * val); // VST2.16 {d0, d2}, [r0] +void vst2_u8_ptr(__transfersize(16) uint8_t * ptr, uint8x8x2_t * val); // VST2.8 {d0, d1}, [r0] +void vst2_u16_ptr(__transfersize(8) uint16_t * ptr, uint16x4x2_t * val); // VST2.16 {d0, d1}, [r0] +void vst2_u32_ptr(__transfersize(4) uint32_t * ptr, uint32x2x2_t * val); // VST2.32 {d0, d1}, [r0] +void vst2_u64_ptr(__transfersize(2) uint64_t * ptr, uint64x1x2_t * val); // VST1.64 {d0, d1}, [r0] +void vst2_s8_ptr(__transfersize(16) int8_t * ptr, int8x8x2_t * val); // VST2.8 {d0, d1}, [r0] +void vst2_s16_ptr(__transfersize(8) int16_t * ptr, int16x4x2_t * val); // VST2.16 {d0, d1}, [r0] +void vst2_s32_ptr(__transfersize(4) int32_t * ptr, int32x2x2_t * val); // VST2.32 {d0, d1}, [r0] +void vst2_s64_ptr(__transfersize(2) int64_t * ptr, int64x1x2_t * val); // VST1.64 {d0, d1}, [r0] +//void vst2_f16_ptr(__transfersize(8) __fp16 * ptr, float16x4x2_t * val); // VST2.16 {d0, d1}, [r0] +void vst2_f32_ptr(__transfersize(4) float32_t * ptr, float32x2x2_t * val); // VST2.32 {d0, d1}, [r0] +void vst2_p8_ptr(__transfersize(16) poly8_t * ptr, poly8x8x2_t * val); // VST2.8 {d0, d1}, [r0] +void vst2_p16_ptr(__transfersize(8) poly16_t * ptr, poly16x4x2_t * val); // VST2.16 {d0, d1}, [r0] +void vst3q_u8_ptr(__transfersize(48) uint8_t * ptr, uint8x16x3_t * val); // VST3.8 {d0, d2, d4}, [r0] +void vst3q_u16_ptr(__transfersize(24) uint16_t * ptr, uint16x8x3_t * val); // VST3.16 {d0, d2, d4}, [r0] +void vst3q_u32_ptr(__transfersize(12) uint32_t * ptr, uint32x4x3_t * val); // VST3.32 {d0, d2, d4}, [r0] +void vst3q_s8_ptr(__transfersize(48) int8_t * ptr, int8x16x3_t * val); // VST3.8 {d0, d2, d4}, [r0] +void vst3q_s16_ptr(__transfersize(24) int16_t * ptr, int16x8x3_t * val); // VST3.16 {d0, d2, d4}, [r0] +void vst3q_s32_ptr(__transfersize(12) int32_t * ptr, int32x4x3_t * val); // VST3.32 {d0, d2, d4}, [r0] +void vst3q_f16_ptr(__transfersize(24) __fp16 * ptr, float16x8x3_t * val); // VST3.16 {d0, d2, d4}, [r0] +void vst3q_f32_ptr(__transfersize(12) float32_t * ptr, float32x4x3_t * val); // VST3.32 {d0, d2, d4}, [r0] +void vst3q_p8_ptr(__transfersize(48) poly8_t * ptr, poly8x16x3_t * val); // VST3.8 {d0, d2, d4}, [r0] +void vst3q_p16_ptr(__transfersize(24) poly16_t * ptr, poly16x8x3_t * val); // VST3.16 {d0, d2, d4}, [r0] +void vst3_u8_ptr(__transfersize(24) uint8_t * ptr, uint8x8x3_t * val); // VST3.8 {d0, d1, d2}, [r0] +void vst3_u16_ptr(__transfersize(12) uint16_t * ptr, uint16x4x3_t * val); // VST3.16 {d0, d1, d2}, [r0] +void vst3_u32_ptr(__transfersize(6) uint32_t * ptr, uint32x2x3_t * val); // VST3.32 {d0, d1, d2}, [r0] +void vst3_u64_ptr(__transfersize(3) uint64_t * ptr, uint64x1x3_t * val); // VST1.64 {d0, d1, d2}, [r0] +void vst3_s8_ptr(__transfersize(24) int8_t * ptr, int8x8x3_t * val); // VST3.8 {d0, d1, d2}, [r0] +void vst3_s16_ptr(__transfersize(12) int16_t * ptr, int16x4x3_t * val); // VST3.16 {d0, d1, d2}, [r0] +void vst3_s32_ptr(__transfersize(6) int32_t * ptr, int32x2x3_t * val); // VST3.32 {d0, d1, d2}, [r0] +void vst3_s64_ptr(__transfersize(3) int64_t * ptr, int64x1x3_t * val); // VST1.64 {d0, d1, d2}, [r0] +void vst3_f16_ptr(__transfersize(12) __fp16 * ptr, float16x4x3_t * val); // VST3.16 {d0, d1, d2}, [r0] +void vst3_f32_ptr(__transfersize(6) float32_t * ptr, float32x2x3_t * val); // VST3.32 {d0, d1, d2}, [r0] +void vst3_p8_ptr(__transfersize(24) poly8_t * ptr, poly8x8x3_t * val); // VST3.8 {d0, d1, d2}, [r0] +void vst3_p16_ptr(__transfersize(12) poly16_t * ptr, poly16x4x3_t * val); // VST3.16 {d0, d1, d2}, [r0] +void vst4q_u8_ptr(__transfersize(64) uint8_t * ptr, uint8x16x4_t * val); // VST4.8 {d0, d2, d4, d6}, [r0] +void vst4q_u16_ptr(__transfersize(32) uint16_t * ptr, uint16x8x4_t * val); // VST4.16 {d0, d2, d4, d6}, [r0] +void vst4q_u32_ptr(__transfersize(16) uint32_t * ptr, uint32x4x4_t * val); // VST4.32 {d0, d2, d4, d6}, [r0] +void vst4q_s8_ptr(__transfersize(64) int8_t * ptr, int8x16x4_t * val); // VST4.8 {d0, d2, d4, d6}, [r0] +void vst4q_s16_ptr(__transfersize(32) int16_t * ptr, int16x8x4_t * val); // VST4.16 {d0, d2, d4, d6}, [r0] +void vst4q_s32_ptr(__transfersize(16) int32_t * ptr, int32x4x4_t * val); // VST4.32 {d0, d2, d4, d6}, [r0] +void vst4q_f16_ptr(__transfersize(32) __fp16 * ptr, float16x8x4_t * val); // VST4.16 {d0, d2, d4, d6}, [r0] +void vst4q_f32_ptr(__transfersize(16) float32_t * ptr, float32x4x4_t * val); // VST4.32 {d0, d2, d4, d6}, [r0] +void vst4q_p8_ptr(__transfersize(64) poly8_t * ptr, poly8x16x4_t * val); // VST4.8 {d0, d2, d4, d6}, [r0] +void vst4q_p16_ptr(__transfersize(32) poly16_t * ptr, poly16x8x4_t * val); // VST4.16 {d0, d2, d4, d6}, [r0] +void vst4_u8_ptr(__transfersize(32) uint8_t * ptr, uint8x8x4_t * val); // VST4.8 {d0, d1, d2, d3}, [r0] +void vst4_u16_ptr(__transfersize(16) uint16_t * ptr, uint16x4x4_t * val); // VST4.16 {d0, d1, d2, d3}, [r0] +void vst4_u32_ptr(__transfersize(8) uint32_t * ptr, uint32x2x4_t * val); // VST4.32 {d0, d1, d2, d3}, [r0] +void vst4_u64_ptr(__transfersize(4) uint64_t * ptr, uint64x1x4_t * val); // VST1.64 {d0, d1, d2, d3}, [r0] +void vst4_s8_ptr(__transfersize(32) int8_t * ptr, int8x8x4_t * val); // VST4.8 {d0, d1, d2, d3}, [r0] +void vst4_s16_ptr(__transfersize(16) int16_t * ptr, int16x4x4_t * val); // VST4.16 {d0, d1, d2, d3}, [r0] +void vst4_s32_ptr(__transfersize(8) int32_t * ptr, int32x2x4_t * val); // VST4.32 {d0, d1, d2, d3}, [r0] +void vst4_s64_ptr(__transfersize(4) int64_t * ptr, int64x1x4_t * val); // VST1.64 {d0, d1, d2, d3}, [r0] +void vst4_f16_ptr(__transfersize(16) __fp16 * ptr, float16x4x4_t * val); // VST4.16 {d0, d1, d2, d3}, [r0] +void vst4_f32_ptr(__transfersize(8) float32_t * ptr, float32x2x4_t * val); // VST4.32 {d0, d1, d2, d3}, [r0] +void vst4_p8_ptr(__transfersize(32) poly8_t * ptr, poly8x8x4_t * val); // VST4.8 {d0, d1, d2, d3}, [r0] +void vst4_p16_ptr(__transfersize(16) poly16_t * ptr, poly16x4x4_t * val); // VST4.16 {d0, d1, d2, d3}, [r0] +//Store a single lane of N-element structure to memory +void vst2q_lane_u16_ptr(__transfersize(2) uint16_t * ptr, uint16x8x2_t * val, __constrange(0,7) int lane); // VST2.16{d0[0], d2[0]}, [r0] +void vst2q_lane_u32_ptr(__transfersize(2) uint32_t * ptr, uint32x4x2_t * val, __constrange(0,3) int lane); // VST2.32{d0[0], d2[0]}, [r0] +void vst2q_lane_s16_ptr(__transfersize(2) int16_t * ptr, int16x8x2_t * val, __constrange(0,7) int lane); // VST2.16{d0[0], d2[0]}, [r0] +void vst2q_lane_s32_ptr(__transfersize(2) int32_t * ptr, int32x4x2_t * val, __constrange(0,3) int lane); // VST2.32{d0[0], d2[0]}, [r0] +void vst2q_lane_f16_ptr(__transfersize(2) __fp16 * ptr, float16x8x2_t * val, __constrange(0,7) int lane); // VST2.16{d0[0], d2[0]}, [r0] +void vst2q_lane_f32_ptr(__transfersize(2) float32_t * ptr, float32x4x2_t * val, __constrange(0,3) int lane); //VST2.32 {d0[0], d2[0]}, [r0] +void vst2q_lane_p16_ptr(__transfersize(2) poly16_t * ptr, poly16x8x2_t * val, __constrange(0,7) int lane); // VST2.16{d0[0], d2[0]}, [r0] +void vst2_lane_u8_ptr(__transfersize(2) uint8_t * ptr, uint8x8x2_t * val, __constrange(0,7) int lane); // VST2.8{d0[0], d1[0]}, [r0] +void vst2_lane_u16_ptr(__transfersize(2) uint16_t * ptr, uint16x4x2_t * val, __constrange(0,3) int lane); // VST2.16{d0[0], d1[0]}, [r0] +void vst2_lane_u32_ptr(__transfersize(2) uint32_t * ptr, uint32x2x2_t * val, __constrange(0,1) int lane); // VST2.32{d0[0], d1[0]}, [r0] +void vst2_lane_s8_ptr(__transfersize(2) int8_t * ptr, int8x8x2_t * val, __constrange(0,7) int lane); // VST2.8 {d0[0],d1[0]}, [r0] +void vst2_lane_s16_ptr(__transfersize(2) int16_t * ptr, int16x4x2_t * val, __constrange(0,3) int lane); // VST2.16{d0[0], d1[0]}, [r0] +void vst2_lane_s32_ptr(__transfersize(2) int32_t * ptr, int32x2x2_t * val, __constrange(0,1) int lane); // VST2.32{d0[0], d1[0]}, [r0] +void vst2_lane_f16_ptr(__transfersize(2) __fp16 * ptr, float16x4x2_t * val, __constrange(0,3) int lane); // VST2.16{d0[0], d1[0]}, [r0] +void vst2_lane_f32_ptr(__transfersize(2) float32_t * ptr, float32x2x2_t * val, __constrange(0,1) int lane); // VST2.32{d0[0], d1[0]}, [r0] +void vst2_lane_p8_ptr(__transfersize(2) poly8_t * ptr, poly8x8x2_t * val, __constrange(0,7) int lane); // VST2.8{d0[0], d1[0]}, [r0] +void vst2_lane_p16_ptr(__transfersize(2) poly16_t * ptr, poly16x4x2_t * val, __constrange(0,3) int lane); // VST2.16{d0[0], d1[0]}, [r0] +void vst3q_lane_u16_ptr(__transfersize(3) uint16_t * ptr, uint16x8x3_t * val, __constrange(0,7) int lane); // VST3.16{d0[0], d2[0], d4[0]}, [r0] +void vst3q_lane_u32_ptr(__transfersize(3) uint32_t * ptr, uint32x4x3_t * val, __constrange(0,3) int lane); // VST3.32{d0[0], d2[0], d4[0]}, [r0] +void vst3q_lane_s16_ptr(__transfersize(3) int16_t * ptr, int16x8x3_t * val, __constrange(0,7) int lane); // VST3.16{d0[0], d2[0], d4[0]}, [r0] +void vst3q_lane_s32_ptr(__transfersize(3) int32_t * ptr, int32x4x3_t * val, __constrange(0,3) int lane); // VST3.32{d0[0], d2[0], d4[0]}, [r0] +void vst3q_lane_f16_ptr(__transfersize(3) __fp16 * ptr, float16x8x3_t * val, __constrange(0,7) int lane); // VST3.16{d0[0], d2[0], d4[0]}, [r0] +void vst3q_lane_f32_ptr(__transfersize(3) float32_t * ptr, float32x4x3_t * val, __constrange(0,3) int lane); //VST3.32 {d0[0], d2[0], d4[0]}, [r0] +void vst3q_lane_p16_ptr(__transfersize(3) poly16_t * ptr, poly16x8x3_t * val, __constrange(0,7) int lane); // VST3.16{d0[0], d2[0], d4[0]}, [r0] +void vst3_lane_u8_ptr(__transfersize(3) uint8_t * ptr, uint8x8x3_t * val, __constrange(0,7) int lane); // VST3.8{d0[0], d1[0], d2[0]}, [r0] +void vst3_lane_u16_ptr(__transfersize(3) uint16_t * ptr, uint16x4x3_t * val, __constrange(0,3) int lane); // VST3.16{d0[0], d1[0], d2[0]}, [r0] +void vst3_lane_u32_ptr(__transfersize(3) uint32_t * ptr, uint32x2x3_t * val, __constrange(0,1) int lane); // VST3.32{d0[0], d1[0], d2[0]}, [r0] +void vst3_lane_s8_ptr(__transfersize(3) int8_t * ptr, int8x8x3_t * val, __constrange(0,7) int lane); // VST3.8 {d0[0],d1[0], d2[0]}, [r0] +void vst3_lane_s16_ptr(__transfersize(3) int16_t * ptr, int16x4x3_t * val, __constrange(0,3) int lane); // VST3.16{d0[0], d1[0], d2[0]}, [r0] +void vst3_lane_s32_ptr(__transfersize(3) int32_t * ptr, int32x2x3_t * val, __constrange(0,1) int lane); // VST3.32{d0[0], d1[0], d2[0]}, [r0] +void vst3_lane_f16_ptr(__transfersize(3) __fp16 * ptr, float16x4x3_t * val, __constrange(0,3) int lane); // VST3.16{d0[0], d1[0], d2[0]}, [r0] +void vst3_lane_f32_ptr(__transfersize(3) float32_t * ptr, float32x2x3_t * val, __constrange(0,1) int lane); // VST3.32{d0[0], d1[0], d2[0]}, [r0] +void vst3_lane_p8_ptr(__transfersize(3) poly8_t * ptr, poly8x8x3_t * val, __constrange(0,7) int lane); // VST3.8{d0[0], d1[0], d2[0]}, [r0] +void vst3_lane_p16_ptr(__transfersize(3) poly16_t * ptr, poly16x4x3_t * val, __constrange(0,3) int lane); // VST3.16{d0[0], d1[0], d2[0]}, [r0] +void vst4q_lane_u16_ptr(__transfersize(4) uint16_t * ptr, uint16x8x4_t * val, __constrange(0,7) int lane); // VST4.16{d0[0], d2[0], d4[0], d6[0]}, [r0] +void vst4q_lane_u32_ptr(__transfersize(4) uint32_t * ptr, uint32x4x4_t * val, __constrange(0,3) int lane); // VST4.32{d0[0], d2[0], d4[0], d6[0]}, [r0] +void vst4q_lane_s16_ptr(__transfersize(4) int16_t * ptr, int16x8x4_t * val, __constrange(0,7) int lane); // VST4.16{d0[0], d2[0], d4[0], d6[0]}, [r0] +void vst4q_lane_s32_ptr(__transfersize(4) int32_t * ptr, int32x4x4_t * val, __constrange(0,3) int lane); // VST4.32{d0[0], d2[0], d4[0], d6[0]}, [r0] +void vst4q_lane_f16_ptr(__transfersize(4) __fp16 * ptr, float16x8x4_t * val, __constrange(0,7) int lane); // VST4.16{d0[0], d2[0], d4[0], d6[0]}, [r0] +void vst4q_lane_f32_ptr(__transfersize(4) float32_t * ptr, float32x4x4_t * val, __constrange(0,3) int lane); //VST4.32 {d0[0], d2[0], d4[0], d6[0]}, [r0] +void vst4q_lane_p16_ptr(__transfersize(4) poly16_t * ptr, poly16x8x4_t * val, __constrange(0,7) int lane); // VST4.16{d0[0], d2[0], d4[0], d6[0]}, [r0] +void vst4_lane_u8_ptr(__transfersize(4) uint8_t * ptr, uint8x8x4_t * val, __constrange(0,7) int lane); // VST4.8{d0[0], d1[0], d2[0], d3[0]}, [r0] +void vst4_lane_u16_ptr(__transfersize(4) uint16_t * ptr, uint16x4x4_t * val, __constrange(0,3) int lane); // VST4.16{d0[0], d1[0], d2[0], d3[0]}, [r0] +void vst4_lane_u32_ptr(__transfersize(4) uint32_t * ptr, uint32x2x4_t * val, __constrange(0,1) int lane); // VST4.32{d0[0], d1[0], d2[0], d3[0]}, [r0] +void vst4_lane_s8_ptr(__transfersize(4) int8_t * ptr, int8x8x4_t * val, __constrange(0,7) int lane); // VST4.8 {d0[0],d1[0], d2[0], d3[0]}, [r0] +void vst4_lane_s16_ptr(__transfersize(4) int16_t * ptr, int16x4x4_t * val, __constrange(0,3) int lane); // VST4.16{d0[0], d1[0], d2[0], d3[0]}, [r0] +void vst4_lane_s32_ptr(__transfersize(4) int32_t * ptr, int32x2x4_t * val, __constrange(0,1) int lane); // VST4.32{d0[0], d1[0], d2[0], d3[0]}, [r0] +void vst4_lane_f16_ptr(__transfersize(4) __fp16 * ptr, float16x4x4_t * val, __constrange(0,3) int lane); // VST4.16{d0[0], d1[0], d2[0], d3[0]}, [r0] +void vst4_lane_f32_ptr(__transfersize(4) float32_t * ptr, float32x2x4_t * val, __constrange(0,1) int lane); // VST4.32{d0[0], d1[0], d2[0], d3[0]}, [r0] +void vst4_lane_p8_ptr(__transfersize(4) poly8_t * ptr, poly8x8x4_t * val, __constrange(0,7) int lane); // VST4.8{d0[0], d1[0], d2[0], d3[0]}, [r0] +void vst4_lane_p16_ptr(__transfersize(4) poly16_t * ptr, poly16x4x4_t * val, __constrange(0,3) int lane); // VST4.16{d0[0], d1[0], d2[0], d3[0]}, [r0] +//Extract lanes from a vector and put into a register. These intrinsics extract a single lane (element) from a vector. +uint8_t vget_lane_u8(uint8x8_t vec, __constrange(0,7) int lane); // VMOV.U8 r0, d0[0] +uint16_t vget_lane_u16(uint16x4_t vec, __constrange(0,3) int lane); // VMOV.U16 r0, d0[0] +uint32_t vget_lane_u32(uint32x2_t vec, __constrange(0,1) int lane); // VMOV.32 r0, d0[0] +int8_t vget_lane_s8(int8x8_t vec, __constrange(0,7) int lane); // VMOV.S8 r0, d0[0] +int16_t vget_lane_s16(int16x4_t vec, __constrange(0,3) int lane); // VMOV.S16 r0, d0[0] +int32_t vget_lane_s32(int32x2_t vec, __constrange(0,1) int lane); // VMOV.32 r0, d0[0] +poly8_t vget_lane_p8(poly8x8_t vec, __constrange(0,7) int lane); // VMOV.U8 r0, d0[0] +poly16_t vget_lane_p16(poly16x4_t vec, __constrange(0,3) int lane); // VMOV.U16 r0, d0[0] +float32_t vget_lane_f32(float32x2_t vec, __constrange(0,1) int lane); // VMOV.32 r0, d0[0] +uint8_t vgetq_lane_u8(uint8x16_t vec, __constrange(0,15) int lane); // VMOV.U8 r0, d0[0] +uint16_t vgetq_lane_u16(uint16x8_t vec, __constrange(0,7) int lane); // VMOV.U16 r0, d0[0] +uint32_t vgetq_lane_u32(uint32x4_t vec, __constrange(0,3) int lane); // VMOV.32 r0, d0[0] +int8_t vgetq_lane_s8(int8x16_t vec, __constrange(0,15) int lane); // VMOV.S8 r0, d0[0] +int16_t vgetq_lane_s16(int16x8_t vec, __constrange(0,7) int lane); // VMOV.S16 r0, d0[0] +int32_t vgetq_lane_s32(int32x4_t vec, __constrange(0,3) int lane); // VMOV.32 r0, d0[0] +poly8_t vgetq_lane_p8(poly8x16_t vec, __constrange(0,15) int lane); // VMOV.U8 r0, d0[0] +poly16_t vgetq_lane_p16(poly16x8_t vec, __constrange(0,7) int lane); // VMOV.U16 r0, d0[0] +float32_t vgetq_lane_f32(float32x4_t vec, __constrange(0,3) int lane); // VMOV.32 r0, d0[0] +int64_t vget_lane_s64(int64x1_t vec, __constrange(0,0) int lane); // VMOV r0,r0,d0 +uint64_t vget_lane_u64(uint64x1_t vec, __constrange(0,0) int lane); // VMOV r0,r0,d0 +int64_t vgetq_lane_s64(int64x2_t vec, __constrange(0,1) int lane); // VMOV r0,r0,d0 +uint64_t vgetq_lane_u64(uint64x2_t vec, __constrange(0,1) int lane); // VMOV r0,r0,d0 +//Load a single lane of a vector from a literal. These intrinsics set a single lane (element) within a vector. +uint8x8_t vset_lane_u8(uint8_t value, uint8x8_t vec, __constrange(0,7) int lane); // VMOV.8 d0[0],r0 +uint16x4_t vset_lane_u16(uint16_t value, uint16x4_t vec, __constrange(0,3) int lane); // VMOV.16 d0[0],r0 +uint32x2_t vset_lane_u32(uint32_t value, uint32x2_t vec, __constrange(0,1) int lane); // VMOV.32 d0[0],r0 +int8x8_t vset_lane_s8(int8_t value, int8x8_t vec, __constrange(0,7) int lane); // VMOV.8 d0[0],r0 +int16x4_t vset_lane_s16(int16_t value, int16x4_t vec, __constrange(0,3) int lane); // VMOV.16 d0[0],r0 +int32x2_t vset_lane_s32(int32_t value, int32x2_t vec, __constrange(0,1) int lane); // VMOV.32 d0[0],r0 +poly8x8_t vset_lane_p8(poly8_t value, poly8x8_t vec, __constrange(0,7) int lane); // VMOV.8 d0[0],r0 +poly16x4_t vset_lane_p16(poly16_t value, poly16x4_t vec, __constrange(0,3) int lane); // VMOV.16 d0[0],r0 +float32x2_t vset_lane_f32(float32_t value, float32x2_t vec, __constrange(0,1) int lane); // VMOV.32 d0[0],r0 +uint8x16_t vsetq_lane_u8(uint8_t value, uint8x16_t vec, __constrange(0,15) int lane); // VMOV.8 d0[0],r0 +uint16x8_t vsetq_lane_u16(uint16_t value, uint16x8_t vec, __constrange(0,7) int lane); // VMOV.16 d0[0],r0 +uint32x4_t vsetq_lane_u32(uint32_t value, uint32x4_t vec, __constrange(0,3) int lane); // VMOV.32 d0[0],r0 +int8x16_t vsetq_lane_s8(int8_t value, int8x16_t vec, __constrange(0,15) int lane); // VMOV.8 d0[0],r0 +int16x8_t vsetq_lane_s16(int16_t value, int16x8_t vec, __constrange(0,7) int lane); // VMOV.16 d0[0],r0 +int32x4_t vsetq_lane_s32(int32_t value, int32x4_t vec, __constrange(0,3) int lane); // VMOV.32 d0[0],r0 +poly8x16_t vsetq_lane_p8(poly8_t value, poly8x16_t vec, __constrange(0,15) int lane); // VMOV.8 d0[0],r0 +poly16x8_t vsetq_lane_p16(poly16_t value, poly16x8_t vec, __constrange(0,7) int lane); // VMOV.16 d0[0],r0 +float32x4_t vsetq_lane_f32(float32_t value, float32x4_t vec, __constrange(0,3) int lane); // VMOV.32 d0[0],r0 +int64x1_t vset_lane_s64(int64_t value, int64x1_t vec, __constrange(0,0) int lane); // VMOV d0,r0,r0 +uint64x1_t vset_lane_u64(uint64_t value, uint64x1_t vec, __constrange(0,0) int lane); // VMOV d0,r0,r0 +int64x2_t vsetq_lane_s64(int64_t value, int64x2_t vec, __constrange(0,1) int lane); // VMOV d0,r0,r0 +uint64x2_t vsetq_lane_u64(uint64_t value, uint64x2_t vec, __constrange(0,1) int lane); // VMOV d0,r0,r0 +//Initialize a vector from a literal bit pattern. +int8x8_t vcreate_s8(uint64_t a); // VMOV d0,r0,r0 +int16x4_t vcreate_s16(uint64_t a); // VMOV d0,r0,r0 +int32x2_t vcreate_s32(uint64_t a); // VMOV d0,r0,r0 +float16x4_t vcreate_f16(uint64_t a); // VMOV d0,r0,r0 +float32x2_t vcreate_f32(uint64_t a); // VMOV d0,r0,r0 +uint8x8_t vcreate_u8(uint64_t a); // VMOV d0,r0,r0 +uint16x4_t vcreate_u16(uint64_t a); // VMOV d0,r0,r0 +uint32x2_t vcreate_u32(uint64_t a); // VMOV d0,r0,r0 +uint64x1_t vcreate_u64(uint64_t a); // VMOV d0,r0,r0 +poly8x8_t vcreate_p8(uint64_t a); // VMOV d0,r0,r0 +poly16x4_t vcreate_p16(uint64_t a); // VMOV d0,r0,r0 +int64x1_t vcreate_s64(uint64_t a); // VMOV d0,r0,r0 +//Set all lanes to same value +//Load all lanes of vector to the same literal value +uint8x8_t vdup_n_u8(uint8_t value); // VDUP.8 d0,r0 +uint16x4_t vdup_n_u16(uint16_t value); // VDUP.16 d0,r0 +uint32x2_t vdup_n_u32(uint32_t value); // VDUP.32 d0,r0 +int8x8_t vdup_n_s8(int8_t value); // VDUP.8 d0,r0 +int16x4_t vdup_n_s16(int16_t value); // VDUP.16 d0,r0 +int32x2_t vdup_n_s32(int32_t value); // VDUP.32 d0,r0 +poly8x8_t vdup_n_p8(poly8_t value); // VDUP.8 d0,r0 +poly16x4_t vdup_n_p16(poly16_t value); // VDUP.16 d0,r0 +float32x2_t vdup_n_f32(float32_t value); // VDUP.32 d0,r0 +uint8x16_t vdupq_n_u8(uint8_t value); // VDUP.8 q0,r0 +uint16x8_t vdupq_n_u16(uint16_t value); // VDUP.16 q0,r0 +uint32x4_t vdupq_n_u32(uint32_t value); // VDUP.32 q0,r0 +int8x16_t vdupq_n_s8(int8_t value); // VDUP.8 q0,r0 +int16x8_t vdupq_n_s16(int16_t value); // VDUP.16 q0,r0 +int32x4_t vdupq_n_s32(int32_t value); // VDUP.32 q0,r0 +poly8x16_t vdupq_n_p8(poly8_t value); // VDUP.8 q0,r0 +poly16x8_t vdupq_n_p16(poly16_t value); // VDUP.16 q0,r0 +float32x4_t vdupq_n_f32(float32_t value); // VDUP.32 q0,r0 +int64x1_t vdup_n_s64(int64_t value); // VMOV d0,r0,r0 +uint64x1_t vdup_n_u64(uint64_t value); // VMOV d0,r0,r0 +int64x2_t vdupq_n_s64(int64_t value); // VMOV d0,r0,r0 +uint64x2_t vdupq_n_u64(uint64_t value); // VMOV d0,r0,r0 +uint8x8_t vmov_n_u8(uint8_t value); // VDUP.8 d0,r0 +uint16x4_t vmov_n_u16(uint16_t value); // VDUP.16 d0,r0 +uint32x2_t vmov_n_u32(uint32_t value); // VDUP.32 d0,r0 +int8x8_t vmov_n_s8(int8_t value); // VDUP.8 d0,r0 +int16x4_t vmov_n_s16(int16_t value); // VDUP.16 d0,r0 +int32x2_t vmov_n_s32(int32_t value); // VDUP.32 d0,r0 +poly8x8_t vmov_n_p8(poly8_t value); // VDUP.8 d0,r0 +poly16x4_t vmov_n_p16(poly16_t value); // VDUP.16 d0,r0 +float32x2_t vmov_n_f32(float32_t value); // VDUP.32 d0,r0 +uint8x16_t vmovq_n_u8(uint8_t value); // VDUP.8 q0,r0 +uint16x8_t vmovq_n_u16(uint16_t value); // VDUP.16 q0,r0 +uint32x4_t vmovq_n_u32(uint32_t value); // VDUP.32 q0,r0 +int8x16_t vmovq_n_s8(int8_t value); // VDUP.8 q0,r0 +int16x8_t vmovq_n_s16(int16_t value); // VDUP.16 q0,r0 +int32x4_t vmovq_n_s32(int32_t value); // VDUP.32 q0,r0 +poly8x16_t vmovq_n_p8(poly8_t value); // VDUP.8 q0,r0 +poly16x8_t vmovq_n_p16(poly16_t value); // VDUP.16 q0,r0 +float32x4_t vmovq_n_f32(float32_t value); // VDUP.32 q0,r0 +int64x1_t vmov_n_s64(int64_t value); // VMOV d0,r0,r0 +uint64x1_t vmov_n_u64(uint64_t value); // VMOV d0,r0,r0 +int64x2_t vmovq_n_s64(int64_t value); // VMOV d0,r0,r0 +uint64x2_t vmovq_n_u64(uint64_t value); // VMOV d0,r0,r0 +//Load all lanes of the vector to the value of a lane of a vector +uint8x8_t vdup_lane_u8(uint8x8_t vec, __constrange(0,7) int lane); // VDUP.8 d0,d0[0] +uint16x4_t vdup_lane_u16(uint16x4_t vec, __constrange(0,3) int lane); // VDUP.16 d0,d0[0] +uint32x2_t vdup_lane_u32(uint32x2_t vec, __constrange(0,1) int lane); // VDUP.32 d0,d0[0] +int8x8_t vdup_lane_s8(int8x8_t vec, __constrange(0,7) int lane); // VDUP.8 d0,d0[0] +int16x4_t vdup_lane_s16(int16x4_t vec, __constrange(0,3) int lane); // VDUP.16 d0,d0[0] +int32x2_t vdup_lane_s32(int32x2_t vec, __constrange(0,1) int lane); // VDUP.32 d0,d0[0] +poly8x8_t vdup_lane_p8(poly8x8_t vec, __constrange(0,7) int lane); // VDUP.8 d0,d0[0] +poly16x4_t vdup_lane_p16(poly16x4_t vec, __constrange(0,3) int lane); // VDUP.16 d0,d0[0] +float32x2_t vdup_lane_f32(float32x2_t vec, __constrange(0,1) int lane); // VDUP.32 d0,d0[0] +uint8x16_t vdupq_lane_u8(uint8x8_t vec, __constrange(0,7) int lane); // VDUP.8 q0,d0[0] +uint16x8_t vdupq_lane_u16(uint16x4_t vec, __constrange(0,3) int lane); // VDUP.16 q0,d0[0] +uint32x4_t vdupq_lane_u32(uint32x2_t vec, __constrange(0,1) int lane); // VDUP.32 q0,d0[0] +int8x16_t vdupq_lane_s8(int8x8_t vec, __constrange(0,7) int lane); // VDUP.8 q0,d0[0] +int16x8_t vdupq_lane_s16(int16x4_t vec, __constrange(0,3) int lane); // VDUP.16 q0,d0[0] +int32x4_t vdupq_lane_s32(int32x2_t vec, __constrange(0,1) int lane); // VDUP.32 q0,d0[0] +poly8x16_t vdupq_lane_p8(poly8x8_t vec, __constrange(0,7) int lane); // VDUP.8 q0,d0[0] +poly16x8_t vdupq_lane_p16(poly16x4_t vec, __constrange(0,3) int lane); // VDUP.16 q0,d0[0] +float32x4_t vdupq_lane_f32(float32x2_t vec, __constrange(0,1) int lane); // VDUP.32 q0,d0[0] +int64x1_t vdup_lane_s64(int64x1_t vec, __constrange(0,0) int lane); // VMOV d0,d0 +uint64x1_t vdup_lane_u64(uint64x1_t vec, __constrange(0,0) int lane); // VMOV d0,d0 +int64x2_t vdupq_lane_s64(int64x1_t vec, __constrange(0,0) int lane); // VMOV q0,q0 +uint64x2_t vdupq_lane_u64(uint64x1_t vec, __constrange(0,0) int lane); // VMOV q0,q0 +//Combining vectors. These intrinsics join two 64 bit vectors into a single 128bit vector. +int8x16_t vcombine_s8(int8x8_t low, int8x8_t high); // VMOV d0,d0 +int16x8_t vcombine_s16(int16x4_t low, int16x4_t high); // VMOV d0,d0 +int32x4_t vcombine_s32(int32x2_t low, int32x2_t high); // VMOV d0,d0 +int64x2_t vcombine_s64(int64x1_t low, int64x1_t high); // VMOV d0,d0 +float16x8_t vcombine_f16(float16x4_t low, float16x4_t high); // VMOV d0,d0 +float32x4_t vcombine_f32(float32x2_t low, float32x2_t high); // VMOV d0,d0 +uint8x16_t vcombine_u8(uint8x8_t low, uint8x8_t high); // VMOV d0,d0 +uint16x8_t vcombine_u16(uint16x4_t low, uint16x4_t high); // VMOV d0,d0 +uint32x4_t vcombine_u32(uint32x2_t low, uint32x2_t high); // VMOV d0,d0 +uint64x2_t vcombine_u64(uint64x1_t low, uint64x1_t high); // VMOV d0,d0 +poly8x16_t vcombine_p8(poly8x8_t low, poly8x8_t high); // VMOV d0,d0 +poly16x8_t vcombine_p16(poly16x4_t low, poly16x4_t high); // VMOV d0,d0 +//Splitting vectors. These intrinsics split a 128 bit vector into 2 component 64 bit vectors +int8x8_t vget_high_s8(int8x16_t a); // VMOV d0,d0 +int16x4_t vget_high_s16(int16x8_t a); // VMOV d0,d0 +int32x2_t vget_high_s32(int32x4_t a); // VMOV d0,d0 +int64x1_t vget_high_s64(int64x2_t a); // VMOV d0,d0 +float16x4_t vget_high_f16(float16x8_t a); // VMOV d0,d0 +float32x2_t vget_high_f32(float32x4_t a); // VMOV d0,d0 +uint8x8_t vget_high_u8(uint8x16_t a); // VMOV d0,d0 +uint16x4_t vget_high_u16(uint16x8_t a); // VMOV d0,d0 +uint32x2_t vget_high_u32(uint32x4_t a); // VMOV d0,d0 +uint64x1_t vget_high_u64(uint64x2_t a); // VMOV d0,d0 +poly8x8_t vget_high_p8(poly8x16_t a); // VMOV d0,d0 +poly16x4_t vget_high_p16(poly16x8_t a); // VMOV d0,d0 +int8x8_t vget_low_s8(int8x16_t a); // VMOV d0,d0 +int16x4_t vget_low_s16(int16x8_t a); // VMOV d0,d0 +int32x2_t vget_low_s32(int32x4_t a); // VMOV d0,d0 +int64x1_t vget_low_s64(int64x2_t a); // VMOV d0,d0 +float16x4_t vget_low_f16(float16x8_t a); // VMOV d0,d0 +float32x2_t vget_low_f32(float32x4_t a); // VMOV d0,d0 +uint8x8_t vget_low_u8(uint8x16_t a); // VMOV d0,d0 +uint16x4_t vget_low_u16(uint16x8_t a); // VMOV d0,d0 +uint32x2_t vget_low_u32(uint32x4_t a); // VMOV d0,d0 +uint64x1_t vget_low_u64(uint64x2_t a); // VMOV d0,d0 +poly8x8_t vget_low_p8(poly8x16_t a); // VMOV d0,d0 +poly16x4_t vget_low_p16(poly16x8_t a); // VMOV d0,d0 +//Converting vectors. These intrinsics are used to convert vectors. +//Convert from float +int32x2_t vcvt_s32_f32(float32x2_t a); // VCVT.S32.F32 d0, d0 +uint32x2_t vcvt_u32_f32(float32x2_t a); // VCVT.U32.F32 d0, d0 +int32x4_t vcvtq_s32_f32(float32x4_t a); // VCVT.S32.F32 q0, q0 +uint32x4_t vcvtq_u32_f32(float32x4_t a); // VCVT.U32.F32 q0, q0 +int32x2_t vcvt_n_s32_f32(float32x2_t a, __constrange(1,32) int b); // VCVT.S32.F32 d0, d0, #32 +uint32x2_t vcvt_n_u32_f32(float32x2_t a, __constrange(1,32) int b); // VCVT.U32.F32 d0, d0, #32 +int32x4_t vcvtq_n_s32_f32(float32x4_t a, __constrange(1,32) int b); // VCVT.S32.F32 q0, q0, #32 +uint32x4_t vcvtq_n_u32_f32(float32x4_t a, __constrange(1,32) int b); // VCVT.U32.F32 q0, q0, #32 +//Convert to float +float32x2_t vcvt_f32_s32(int32x2_t a); // VCVT.F32.S32 d0, d0 +float32x2_t vcvt_f32_u32(uint32x2_t a); // VCVT.F32.U32 d0, d0 +float32x4_t vcvtq_f32_s32(int32x4_t a); // VCVT.F32.S32 q0, q0 +float32x4_t vcvtq_f32_u32(uint32x4_t a); // VCVT.F32.U32 q0, q0 +float32x2_t vcvt_n_f32_s32(int32x2_t a, __constrange(1,32) int b); // VCVT.F32.S32 d0, d0, #32 +float32x2_t vcvt_n_f32_u32(uint32x2_t a, __constrange(1,32) int b); // VCVT.F32.U32 d0, d0, #32 +float32x4_t vcvtq_n_f32_s32(int32x4_t a, __constrange(1,32) int b); // VCVT.F32.S32 q0, q0, #32 +float32x4_t vcvtq_n_f32_u32(uint32x4_t a, __constrange(1,32) int b); // VCVT.F32.U32 q0, q0, #32 +//Convert between floats +float16x4_t vcvt_f16_f32(float32x4_t a); // VCVT.F16.F32 d0, q0 +float32x4_t vcvt_f32_f16(float16x4_t a); // VCVT.F32.F16 q0, d0 +//Vector narrow integer +int8x8_t vmovn_s16(int16x8_t a); // VMOVN.I16 d0,q0 +int16x4_t vmovn_s32(int32x4_t a); // VMOVN.I32 d0,q0 +int32x2_t vmovn_s64(int64x2_t a); // VMOVN.I64 d0,q0 +uint8x8_t vmovn_u16(uint16x8_t a); // VMOVN.I16 d0,q0 +uint16x4_t vmovn_u32(uint32x4_t a); // VMOVN.I32 d0,q0 +uint32x2_t vmovn_u64(uint64x2_t a); // VMOVN.I64 d0,q0 +//Vector long move +int16x8_t vmovl_s8(int8x8_t a); // VMOVL.S8 q0,d0 +int32x4_t vmovl_s16(int16x4_t a); // VMOVL.S16 q0,d0 +int64x2_t vmovl_s32(int32x2_t a); // VMOVL.S32 q0,d0 +uint16x8_t vmovl_u8(uint8x8_t a); // VMOVL.U8 q0,d0 +uint32x4_t vmovl_u16(uint16x4_t a); // VMOVL.U16 q0,d0 +uint64x2_t vmovl_u32(uint32x2_t a); // VMOVL.U32 q0,d0 +//Vector saturating narrow integer +int8x8_t vqmovn_s16(int16x8_t a); // VQMOVN.S16 d0,q0 +int16x4_t vqmovn_s32(int32x4_t a); // VQMOVN.S32 d0,q0 +int32x2_t vqmovn_s64(int64x2_t a); // VQMOVN.S64 d0,q0 +uint8x8_t vqmovn_u16(uint16x8_t a); // VQMOVN.U16 d0,q0 +uint16x4_t vqmovn_u32(uint32x4_t a); // VQMOVN.U32 d0,q0 +uint32x2_t vqmovn_u64(uint64x2_t a); // VQMOVN.U64 d0,q0 +//Vector saturating narrow integer signed->unsigned +uint8x8_t vqmovun_s16(int16x8_t a); // VQMOVUN.S16 d0,q0 +uint16x4_t vqmovun_s32(int32x4_t a); // VQMOVUN.S32 d0,q0 +uint32x2_t vqmovun_s64(int64x2_t a); // VQMOVUN.S64 d0,q0 +//Table look up +uint8x8_t vtbl1_u8(uint8x8_t a, uint8x8_t b); // VTBL.8 d0, {d0}, d0 +int8x8_t vtbl1_s8(int8x8_t a, int8x8_t b); // VTBL.8 d0, {d0}, d0 +poly8x8_t vtbl1_p8(poly8x8_t a, uint8x8_t b); // VTBL.8 d0, {d0}, d0 +uint8x8_t vtbl2_u8_ptr(uint8x8x2_t *a, uint8x8_t b); // VTBL.8 d0, {d0, d1}, d0 +int8x8_t vtbl2_s8_ptr(int8x8x2_t *a, int8x8_t b); // VTBL.8 d0, {d0, d1}, d0 +poly8x8_t vtbl2_p8_ptr(poly8x8x2_t *a, uint8x8_t b); // VTBL.8 d0, {d0, d1}, d0 +uint8x8_t vtbl3_u8_ptr(uint8x8x3_t *a, uint8x8_t b); // VTBL.8 d0, {d0, d1, d2}, d0 +int8x8_t vtbl3_s8_ptr(int8x8x3_t *a, int8x8_t b); // VTBL.8 d0, {d0, d1, d2}, d0 +poly8x8_t vtbl3_p8_ptr(poly8x8x3_t *a, uint8x8_t b); // VTBL.8 d0, {d0, d1, d2}, d0 +uint8x8_t vtbl4_u8_ptr(uint8x8x4_t *a, uint8x8_t b); // VTBL.8 d0, {d0, d1, d2, d3}, d0 +int8x8_t vtbl4_s8_ptr(int8x8x4_t *a, int8x8_t b); // VTBL.8 d0, {d0, d1, d2, d3}, d0 +poly8x8_t vtbl4_p8_ptr(poly8x8x4_t *a, uint8x8_t b); // VTBL.8 d0, {d0, d1, d2, d3}, d0 +//Extended table look up intrinsics +uint8x8_t vtbx1_u8(uint8x8_t a, uint8x8_t b, uint8x8_t c); // VTBX.8 d0, {d0}, d0 +int8x8_t vtbx1_s8(int8x8_t a, int8x8_t b, int8x8_t c); // VTBX.8 d0, {d0}, d0 +poly8x8_t vtbx1_p8(poly8x8_t a, poly8x8_t b, uint8x8_t c); // VTBX.8 d0, {d0}, d0 +uint8x8_t vtbx2_u8_ptr(uint8x8_t a, uint8x8x2_t *b, uint8x8_t c); // VTBX.8 d0, {d0, d1}, d0 +int8x8_t vtbx2_s8_ptr(int8x8_t a, int8x8x2_t *b, int8x8_t c); // VTBX.8 d0, {d0, d1}, d0 +poly8x8_t vtbx2_p8_ptr(poly8x8_t a, poly8x8x2_t *b, uint8x8_t c); // VTBX.8 d0, {d0, d1}, d0 +uint8x8_t vtbx3_u8_ptr(uint8x8_t a, uint8x8x3_t *b, uint8x8_t c); // VTBX.8 d0, {d0, d1, d2}, d0 +int8x8_t vtbx3_s8_ptr(int8x8_t a, int8x8x3_t *b, int8x8_t c); // VTBX.8 d0, {d0, d1, d2}, d0 +poly8x8_t vtbx3_p8_ptr(poly8x8_t a, poly8x8x3_t *b, uint8x8_t c); // VTBX.8 d0, {d0, d1, d2}, d0 +uint8x8_t vtbx4_u8_ptr(uint8x8_t a, uint8x8x4_t *b, uint8x8_t c); // VTBX.8 d0, {d0, d1, d2, d3}, d0 +int8x8_t vtbx4_s8_ptr(int8x8_t a, int8x8x4_t *b, int8x8_t c); // VTBX.8 d0, {d0, d1, d2, d3}, d0 +poly8x8_t vtbx4_p8_ptr(poly8x8_t a, poly8x8x4_t *b, uint8x8_t c); // VTBX.8 d0, {d0, d1, d2, d3}, d0 +//Operations with a scalar value +//Vector multiply accumulate with scalar +int16x4_t vmla_lane_s16(int16x4_t a, int16x4_t b, int16x4_t v, __constrange(0,3) int l); // VMLA.I16 d0, d0,d0[0] +int32x2_t vmla_lane_s32(int32x2_t a, int32x2_t b, int32x2_t v, __constrange(0,1) int l); // VMLA.I32 d0, d0,d0[0] +uint16x4_t vmla_lane_u16(uint16x4_t a, uint16x4_t b, uint16x4_t v, __constrange(0,3) int l); // VMLA.I16 d0, d0,d0[0] +uint32x2_t vmla_lane_u32(uint32x2_t a, uint32x2_t b, uint32x2_t v, __constrange(0,1) int l); // VMLA.I32 d0, d0,d0[0] +float32x2_t vmla_lane_f32(float32x2_t a, float32x2_t b, float32x2_t v, __constrange(0,1) int l); // VMLA.F32 d0,d0, d0[0] +int16x8_t vmlaq_lane_s16(int16x8_t a, int16x8_t b, int16x4_t v, __constrange(0,3) int l); // VMLA.I16 q0, q0,d0[0] +int32x4_t vmlaq_lane_s32(int32x4_t a, int32x4_t b, int32x2_t v, __constrange(0,1) int l); // VMLA.I32 q0, q0,d0[0] +uint16x8_t vmlaq_lane_u16(uint16x8_t a, uint16x8_t b, uint16x4_t v, __constrange(0,3) int l); // VMLA.I16 q0,q0, d0[0] +uint32x4_t vmlaq_lane_u32(uint32x4_t a, uint32x4_t b, uint32x2_t v, __constrange(0,1) int l); // VMLA.I32 q0,q0, d0[0] +float32x4_t vmlaq_lane_f32(float32x4_t a, float32x4_t b, float32x2_t v, __constrange(0,1) int l); // VMLA.F32 q0,q0, d0[0] +//Vector widening multiply accumulate with scalar +int32x4_t vmlal_lane_s16(int32x4_t a, int16x4_t b, int16x4_t v, __constrange(0,3) int l); //VMLAL.S16 q0, d0,d0[0] +int64x2_t vmlal_lane_s32(int64x2_t a, int32x2_t b, int32x2_t v, __constrange(0,1) int l); //VMLAL.S32 q0, d0,d0[0] +uint32x4_t vmlal_lane_u16(uint32x4_t a, uint16x4_t b, uint16x4_t v, __constrange(0,3) int l); // VMLAL.U16 q0,d0, d0[0] +uint64x2_t vmlal_lane_u32(uint64x2_t a, uint32x2_t b, uint32x2_t v, __constrange(0,1) int l); // VMLAL.U32 q0,d0, d0[0] +//Vector widening saturating doubling multiply accumulate with scalar +int32x4_t vqdmlal_lane_s16(int32x4_t a, int16x4_t b, int16x4_t v, __constrange(0,3) int l); // VQDMLAL.S16 q0,d0, d0[0] +int64x2_t vqdmlal_lane_s32(int64x2_t a, int32x2_t b, int32x2_t v, __constrange(0,1) int l); // VQDMLAL.S32 q0,d0, d0[0] +//Vector multiply subtract with scalar +int16x4_t vmls_lane_s16(int16x4_t a, int16x4_t b, int16x4_t v, __constrange(0,3) int l); // VMLS.I16 d0, d0,d0[0] +int32x2_t vmls_lane_s32(int32x2_t a, int32x2_t b, int32x2_t v, __constrange(0,1) int l); // VMLS.I32 d0, d0,d0[0] +uint16x4_t vmls_lane_u16(uint16x4_t a, uint16x4_t b, uint16x4_t v, __constrange(0,3) int l); // VMLS.I16 d0, d0,d0[0] +uint32x2_t vmls_lane_u32(uint32x2_t a, uint32x2_t b, uint32x2_t v, __constrange(0,1) int l); // VMLS.I32 d0, d0,d0[0] +float32x2_t vmls_lane_f32(float32x2_t a, float32x2_t b, float32x2_t v, __constrange(0,1) int l); // VMLS.F32 d0,d0, d0[0] +int16x8_t vmlsq_lane_s16(int16x8_t a, int16x8_t b, int16x4_t v, __constrange(0,3) int l); // VMLS.I16 q0, q0,d0[0] +int32x4_t vmlsq_lane_s32(int32x4_t a, int32x4_t b, int32x2_t v, __constrange(0,1) int l); // VMLS.I32 q0, q0,d0[0] +uint16x8_t vmlsq_lane_u16(uint16x8_t a, uint16x8_t b, uint16x4_t v, __constrange(0,3) int l); // VMLS.I16 q0,q0, d0[0] +uint32x4_t vmlsq_lane_u32(uint32x4_t a, uint32x4_t b, uint32x2_t v, __constrange(0,1) int l); // VMLS.I32 q0,q0, d0[0] +float32x4_t vmlsq_lane_f32(float32x4_t a, float32x4_t b, float32x2_t v, __constrange(0,1) int l); // VMLS.F32 q0,q0, d0[0] +//Vector widening multiply subtract with scalar +int32x4_t vmlsl_lane_s16(int32x4_t a, int16x4_t b, int16x4_t v, __constrange(0,3) int l); // VMLSL.S16 q0, d0,d0[0] +int64x2_t vmlsl_lane_s32(int64x2_t a, int32x2_t b, int32x2_t v, __constrange(0,1) int l); // VMLSL.S32 q0, d0,d0[0] +uint32x4_t vmlsl_lane_u16(uint32x4_t a, uint16x4_t b, uint16x4_t v, __constrange(0,3) int l); // VMLSL.U16 q0,d0, d0[0] +uint64x2_t vmlsl_lane_u32(uint64x2_t a, uint32x2_t b, uint32x2_t v, __constrange(0,1) int l); // VMLSL.U32 q0,d0, d0[0] +//Vector widening saturating doubling multiply subtract with scalar +int32x4_t vqdmlsl_lane_s16(int32x4_t a, int16x4_t b, int16x4_t v, __constrange(0,3) int l); // VQDMLSL.S16 q0,d0, d0[0] +int64x2_t vqdmlsl_lane_s32(int64x2_t a, int32x2_t b, int32x2_t v, __constrange(0,1) int l); // VQDMLSL.S32 q0,d0, d0[0] +//Vector multiply by scalar +int16x4_t vmul_n_s16(int16x4_t a, int16_t b); // VMUL.I16 d0,d0,d0[0] +int32x2_t vmul_n_s32(int32x2_t a, int32_t b); // VMUL.I32 d0,d0,d0[0] +float32x2_t vmul_n_f32(float32x2_t a, float32_t b); // VMUL.F32 d0,d0,d0[0] +uint16x4_t vmul_n_u16(uint16x4_t a, uint16_t b); // VMUL.I16 d0,d0,d0[0] +uint32x2_t vmul_n_u32(uint32x2_t a, uint32_t b); // VMUL.I32 d0,d0,d0[0] +int16x8_t vmulq_n_s16(int16x8_t a, int16_t b); // VMUL.I16 q0,q0,d0[0] +int32x4_t vmulq_n_s32(int32x4_t a, int32_t b); // VMUL.I32 q0,q0,d0[0] +float32x4_t vmulq_n_f32(float32x4_t a, float32_t b); // VMUL.F32 q0,q0,d0[0] +uint16x8_t vmulq_n_u16(uint16x8_t a, uint16_t b); // VMUL.I16 q0,q0,d0[0] +uint32x4_t vmulq_n_u32(uint32x4_t a, uint32_t b); // VMUL.I32 q0,q0,d0[0] +//Vector long multiply with scalar +int32x4_t vmull_n_s16(int16x4_t vec1, int16_t val2); // VMULL.S16 q0,d0,d0[0] +int64x2_t vmull_n_s32(int32x2_t vec1, int32_t val2); // VMULL.S32 q0,d0,d0[0] +uint32x4_t vmull_n_u16(uint16x4_t vec1, uint16_t val2); // VMULL.U16 q0,d0,d0[0] +uint64x2_t vmull_n_u32(uint32x2_t vec1, uint32_t val2); // VMULL.U32 q0,d0,d0[0] +//Vector long multiply by scalar +int32x4_t vmull_lane_s16(int16x4_t vec1, int16x4_t val2, __constrange(0, 3) int val3); // VMULL.S16 q0,d0,d0[0] +int64x2_t vmull_lane_s32(int32x2_t vec1, int32x2_t val2, __constrange(0, 1) int val3); // VMULL.S32 q0,d0,d0[0] +uint32x4_t vmull_lane_u16(uint16x4_t vec1, uint16x4_t val2, __constrange(0, 3) int val3); // VMULL.U16 q0,d0,d0[0] +uint64x2_t vmull_lane_u32(uint32x2_t vec1, uint32x2_t val2, __constrange(0, 1) int val3); // VMULL.U32 q0,d0,d0[0] +//Vector saturating doubling long multiply with scalar +int32x4_t vqdmull_n_s16(int16x4_t vec1, int16_t val2); // VQDMULL.S16 q0,d0,d0[0] +int64x2_t vqdmull_n_s32(int32x2_t vec1, int32_t val2); // VQDMULL.S32 q0,d0,d0[0] +//Vector saturating doubling long multiply by scalar +int32x4_t vqdmull_lane_s16(int16x4_t vec1, int16x4_t val2, __constrange(0, 3) int val3); // VQDMULL.S16 q0,d0,d0[0] +int64x2_t vqdmull_lane_s32(int32x2_t vec1, int32x2_t val2, __constrange(0, 1) int val3); // VQDMULL.S32 q0,d0,d0[0] +//Vector saturating doubling multiply high with scalar +int16x4_t vqdmulh_n_s16(int16x4_t vec1, int16_t val2); // VQDMULH.S16 d0,d0,d0[0] +int32x2_t vqdmulh_n_s32(int32x2_t vec1, int32_t val2); // VQDMULH.S32 d0,d0,d0[0] +int16x8_t vqdmulhq_n_s16(int16x8_t vec1, int16_t val2); // VQDMULH.S16 q0,q0,d0[0] +int32x4_t vqdmulhq_n_s32(int32x4_t vec1, int32_t val2); // VQDMULH.S32 q0,q0,d0[0] +//Vector saturating doubling multiply high by scalar +int16x4_t vqdmulh_lane_s16(int16x4_t vec1, int16x4_t val2, __constrange(0, 3) int val3); // VQDMULH.S16 d0,d0,d0[0] +int32x2_t vqdmulh_lane_s32(int32x2_t vec1, int32x2_t val2, __constrange(0, 1) int val3); // VQDMULH.S32 d0,d0,d0[0] +int16x8_t vqdmulhq_lane_s16(int16x8_t vec1, int16x4_t val2, __constrange(0, 3) int val3); // VQDMULH.S16 q0,q0,d0[0] +int32x4_t vqdmulhq_lane_s32(int32x4_t vec1, int32x2_t val2, __constrange(0, 1) int val3); // VQDMULH.S32 q0,q0,d0[0] +//Vector saturating rounding doubling multiply high with scalar +int16x4_t vqrdmulh_n_s16(int16x4_t vec1, int16_t val2); // VQRDMULH.S16 d0,d0,d0[0] +int32x2_t vqrdmulh_n_s32(int32x2_t vec1, int32_t val2); // VQRDMULH.S32 d0,d0,d0[0] +int16x8_t vqrdmulhq_n_s16(int16x8_t vec1, int16_t val2); // VQRDMULH.S16 q0,q0,d0[0] +int32x4_t vqrdmulhq_n_s32(int32x4_t vec1, int32_t val2); // VQRDMULH.S32 q0,q0,d0[0] +//Vector rounding saturating doubling multiply high by scalar +int16x4_t vqrdmulh_lane_s16(int16x4_t vec1, int16x4_t val2, __constrange(0, 3) int val3); // VQRDMULH.S16 d0,d0,d0[0] +int32x2_t vqrdmulh_lane_s32(int32x2_t vec1, int32x2_t val2, __constrange(0, 1) int val3); // VQRDMULH.S32 d0,d0,d0[0] +int16x8_t vqrdmulhq_lane_s16(int16x8_t vec1, int16x4_t val2, __constrange(0, 3) int val3); // VQRDMULH.S16 q0,q0,d0[0] +int32x4_t vqrdmulhq_lane_s32(int32x4_t vec1, int32x2_t val2, __constrange(0, 1) int val3); // VQRDMULH.S32 q0,q0,d0[0] +//Vector multiply accumulate with scalar +int16x4_t vmla_n_s16(int16x4_t a, int16x4_t b, int16_t c); // VMLA.I16 d0, d0, d0[0] +int32x2_t vmla_n_s32(int32x2_t a, int32x2_t b, int32_t c); // VMLA.I32 d0, d0, d0[0] +uint16x4_t vmla_n_u16(uint16x4_t a, uint16x4_t b, uint16_t c); // VMLA.I16 d0, d0, d0[0] +uint32x2_t vmla_n_u32(uint32x2_t a, uint32x2_t b, uint32_t c); // VMLA.I32 d0, d0, d0[0] +float32x2_t vmla_n_f32(float32x2_t a, float32x2_t b, float32_t c); // VMLA.F32 d0, d0, d0[0] +int16x8_t vmlaq_n_s16(int16x8_t a, int16x8_t b, int16_t c); // VMLA.I16 q0, q0, d0[0] +int32x4_t vmlaq_n_s32(int32x4_t a, int32x4_t b, int32_t c); // VMLA.I32 q0, q0, d0[0] +uint16x8_t vmlaq_n_u16(uint16x8_t a, uint16x8_t b, uint16_t c); // VMLA.I16 q0, q0, d0[0] +uint32x4_t vmlaq_n_u32(uint32x4_t a, uint32x4_t b, uint32_t c); // VMLA.I32 q0, q0, d0[0] +float32x4_t vmlaq_n_f32(float32x4_t a, float32x4_t b, float32_t c); // VMLA.F32 q0, q0, d0[0] +//Vector widening multiply accumulate with scalar +int32x4_t vmlal_n_s16(int32x4_t a, int16x4_t b, int16_t c); // VMLAL.S16 q0, d0, d0[0] +int64x2_t vmlal_n_s32(int64x2_t a, int32x2_t b, int32_t c); // VMLAL.S32 q0, d0, d0[0] +uint32x4_t vmlal_n_u16(uint32x4_t a, uint16x4_t b, uint16_t c); // VMLAL.U16 q0, d0, d0[0] +uint64x2_t vmlal_n_u32(uint64x2_t a, uint32x2_t b, uint32_t c); // VMLAL.U32 q0, d0, d0[0] +//Vector widening saturating doubling multiply accumulate with scalar +int32x4_t vqdmlal_n_s16(int32x4_t a, int16x4_t b, int16_t c); // VQDMLAL.S16 q0, d0, d0[0] +int64x2_t vqdmlal_n_s32(int64x2_t a, int32x2_t b, int32_t c); // VQDMLAL.S32 q0, d0, d0[0] +//Vector multiply subtract with scalar +int16x4_t vmls_n_s16(int16x4_t a, int16x4_t b, int16_t c); // VMLS.I16 d0, d0, d0[0] +int32x2_t vmls_n_s32(int32x2_t a, int32x2_t b, int32_t c); // VMLS.I32 d0, d0, d0[0] +uint16x4_t vmls_n_u16(uint16x4_t a, uint16x4_t b, uint16_t c); // VMLS.I16 d0, d0, d0[0] +uint32x2_t vmls_n_u32(uint32x2_t a, uint32x2_t b, uint32_t c); // VMLS.I32 d0, d0, d0[0] +float32x2_t vmls_n_f32(float32x2_t a, float32x2_t b, float32_t c); // VMLS.F32 d0, d0, d0[0] +int16x8_t vmlsq_n_s16(int16x8_t a, int16x8_t b, int16_t c); // VMLS.I16 q0, q0, d0[0] +int32x4_t vmlsq_n_s32(int32x4_t a, int32x4_t b, int32_t c); // VMLS.I32 q0, q0, d0[0] +uint16x8_t vmlsq_n_u16(uint16x8_t a, uint16x8_t b, uint16_t c); // VMLS.I16 q0, q0, d0[0] +uint32x4_t vmlsq_n_u32(uint32x4_t a, uint32x4_t b, uint32_t c); // VMLS.I32 q0, q0, d0[0] +float32x4_t vmlsq_n_f32(float32x4_t a, float32x4_t b, float32_t c); // VMLS.F32 q0, q0, d0[0] +//Vector widening multiply subtract with scalar +int32x4_t vmlsl_n_s16(int32x4_t a, int16x4_t b, int16_t c); // VMLSL.S16 q0, d0, d0[0] +int64x2_t vmlsl_n_s32(int64x2_t a, int32x2_t b, int32_t c); // VMLSL.S32 q0, d0, d0[0] +uint32x4_t vmlsl_n_u16(uint32x4_t a, uint16x4_t b, uint16_t c); // VMLSL.U16 q0, d0, d0[0] +uint64x2_t vmlsl_n_u32(uint64x2_t a, uint32x2_t b, uint32_t c); // VMLSL.U32 q0, d0, d0[0] +//Vector widening saturating doubling multiply subtract with scalar +int32x4_t vqdmlsl_n_s16(int32x4_t a, int16x4_t b, int16_t c); // VQDMLSL.S16 q0, d0, d0[0] +int64x2_t vqdmlsl_n_s32(int64x2_t a, int32x2_t b, int32_t c); // VQDMLSL.S32 q0, d0, d0[0] +//Vector extract +int8x8_t vext_s8(int8x8_t a, int8x8_t b, __constrange(0,7) int c); // VEXT.8 d0,d0,d0,#0 +uint8x8_t vext_u8(uint8x8_t a, uint8x8_t b, __constrange(0,7) int c); // VEXT.8 d0,d0,d0,#0 +poly8x8_t vext_p8(poly8x8_t a, poly8x8_t b, __constrange(0,7) int c); // VEXT.8 d0,d0,d0,#0 +int16x4_t vext_s16(int16x4_t a, int16x4_t b, __constrange(0,3) int c); // VEXT.16 d0,d0,d0,#0 +uint16x4_t vext_u16(uint16x4_t a, uint16x4_t b, __constrange(0,3) int c); // VEXT.16 d0,d0,d0,#0 +poly16x4_t vext_p16(poly16x4_t a, poly16x4_t b, __constrange(0,3) int c); // VEXT.16 d0,d0,d0,#0 +int32x2_t vext_s32(int32x2_t a, int32x2_t b, __constrange(0,1) int c); // VEXT.32 d0,d0,d0,#0 +uint32x2_t vext_u32(uint32x2_t a, uint32x2_t b, __constrange(0,1) int c); // VEXT.32 d0,d0,d0,#0 +int64x1_t vext_s64(int64x1_t a, int64x1_t b, __constrange(0,0) int c); // VEXT.64 d0,d0,d0,#0 +uint64x1_t vext_u64(uint64x1_t a, uint64x1_t b, __constrange(0,0) int c); // VEXT.64 d0,d0,d0,#0 +float32x2_t vext_f32(float32x2_t a, float32x2_t b, __constrange(0,1) int c); // VEXT.32 d0,d0,d0,#0 +int8x16_t vextq_s8(int8x16_t a, int8x16_t b, __constrange(0,15) int c); // VEXT.8 q0,q0,q0,#0 +uint8x16_t vextq_u8(uint8x16_t a, uint8x16_t b, __constrange(0,15) int c); // VEXT.8 q0,q0,q0,#0 +poly8x16_t vextq_p8(poly8x16_t a, poly8x16_t b, __constrange(0,15) int c); // VEXT.8 q0,q0,q0,#0 +int16x8_t vextq_s16(int16x8_t a, int16x8_t b, __constrange(0,7) int c); // VEXT.16 q0,q0,q0,#0 +uint16x8_t vextq_u16(uint16x8_t a, uint16x8_t b, __constrange(0,7) int c); // VEXT.16 q0,q0,q0,#0 +poly16x8_t vextq_p16(poly16x8_t a, poly16x8_t b, __constrange(0,7) int c); // VEXT.16 q0,q0,q0,#0 +int32x4_t vextq_s32(int32x4_t a, int32x4_t b, __constrange(0,3) int c); // VEXT.32 q0,q0,q0,#0 +uint32x4_t vextq_u32(uint32x4_t a, uint32x4_t b, __constrange(0,3) int c); // VEXT.32 q0,q0,q0,#0 +int64x2_t vextq_s64(int64x2_t a, int64x2_t b, __constrange(0,1) int c); // VEXT.64 q0,q0,q0,#0 +uint64x2_t vextq_u64(uint64x2_t a, uint64x2_t b, __constrange(0,1) int c); // VEXT.64 q0,q0,q0,#0 +float32x4_t vextq_f32(float32x4_t a, float32x4_t b, __constrange(0,3) float c); // VEXT.32 q0,q0,q0,#0 +//Reverse vector elements (swap endianness). VREVn.m reverses the order of the m-bit lanes within a set that is n bits wide. +int8x8_t vrev64_s8(int8x8_t vec); // VREV64.8 d0,d0 +int16x4_t vrev64_s16(int16x4_t vec); // VREV64.16 d0,d0 +int32x2_t vrev64_s32(int32x2_t vec); // VREV64.32 d0,d0 +uint8x8_t vrev64_u8(uint8x8_t vec); // VREV64.8 d0,d0 +uint16x4_t vrev64_u16(uint16x4_t vec); // VREV64.16 d0,d0 +uint32x2_t vrev64_u32(uint32x2_t vec); // VREV64.32 d0,d0 +poly8x8_t vrev64_p8(poly8x8_t vec); // VREV64.8 d0,d0 +poly16x4_t vrev64_p16(poly16x4_t vec); // VREV64.16 d0,d0 +float32x2_t vrev64_f32(float32x2_t vec); // VREV64.32 d0,d0 +int8x16_t vrev64q_s8(int8x16_t vec); // VREV64.8 q0,q0 +int16x8_t vrev64q_s16(int16x8_t vec); // VREV64.16 q0,q0 +int32x4_t vrev64q_s32(int32x4_t vec); // VREV64.32 q0,q0 +uint8x16_t vrev64q_u8(uint8x16_t vec); // VREV64.8 q0,q0 +uint16x8_t vrev64q_u16(uint16x8_t vec); // VREV64.16 q0,q0 +uint32x4_t vrev64q_u32(uint32x4_t vec); // VREV64.32 q0,q0 +poly8x16_t vrev64q_p8(poly8x16_t vec); // VREV64.8 q0,q0 +poly16x8_t vrev64q_p16(poly16x8_t vec); // VREV64.16 q0,q0 +float32x4_t vrev64q_f32(float32x4_t vec); // VREV64.32 q0,q0 +int8x8_t vrev32_s8(int8x8_t vec); // VREV32.8 d0,d0 +int16x4_t vrev32_s16(int16x4_t vec); // VREV32.16 d0,d0 +uint8x8_t vrev32_u8(uint8x8_t vec); // VREV32.8 d0,d0 +uint16x4_t vrev32_u16(uint16x4_t vec); // VREV32.16 d0,d0 +poly8x8_t vrev32_p8(poly8x8_t vec); // VREV32.8 d0,d0 +poly16x4_t vrev32_p16(poly16x4_t vec); // VREV32.16 d0,d0 +int8x16_t vrev32q_s8(int8x16_t vec); // VREV32.8 q0,q0 +int16x8_t vrev32q_s16(int16x8_t vec); // VREV32.16 q0,q0 +uint8x16_t vrev32q_u8(uint8x16_t vec); // VREV32.8 q0,q0 +uint16x8_t vrev32q_u16(uint16x8_t vec); // VREV32.16 q0,q0 +poly8x16_t vrev32q_p8(poly8x16_t vec); // VREV32.8 q0,q0 +poly16x8_t vrev32q_p16(poly16x8_t vec); // VREV32.16 q0,q0 +int8x8_t vrev16_s8(int8x8_t vec); // VREV16.8 d0,d0 +uint8x8_t vrev16_u8(uint8x8_t vec); // VREV16.8 d0,d0 +poly8x8_t vrev16_p8(poly8x8_t vec); // VREV16.8 d0,d0 +int8x16_t vrev16q_s8(int8x16_t vec); // VREV16.8 q0,q0 +uint8x16_t vrev16q_u8(uint8x16_t vec); // VREV16.8 q0,q0 +poly8x16_t vrev16q_p8(poly8x16_t vec); // VREV16.8 q0,q0 +//Other single operand arithmetic +//Absolute: Vd[i] = |Va[i]| +int8x8_t vabs_s8(int8x8_t a); // VABS.S8 d0,d0 +int16x4_t vabs_s16(int16x4_t a); // VABS.S16 d0,d0 +int32x2_t vabs_s32(int32x2_t a); // VABS.S32 d0,d0 +float32x2_t vabs_f32(float32x2_t a); // VABS.F32 d0,d0 +int8x16_t vabsq_s8(int8x16_t a); // VABS.S8 q0,q0 +int16x8_t vabsq_s16(int16x8_t a); // VABS.S16 q0,q0 +int32x4_t vabsq_s32(int32x4_t a); // VABS.S32 q0,q0 +float32x4_t vabsq_f32(float32x4_t a); // VABS.F32 q0,q0 +//Saturating absolute: Vd[i] = sat(|Va[i]|) +int8x8_t vqabs_s8(int8x8_t a); // VQABS.S8 d0,d0 +int16x4_t vqabs_s16(int16x4_t a); // VQABS.S16 d0,d0 +int32x2_t vqabs_s32(int32x2_t a); // VQABS.S32 d0,d0 +int8x16_t vqabsq_s8(int8x16_t a); // VQABS.S8 q0,q0 +int16x8_t vqabsq_s16(int16x8_t a); // VQABS.S16 q0,q0 +int32x4_t vqabsq_s32(int32x4_t a); // VQABS.S32 q0,q0 +//Negate: Vd[i] = - Va[i] +int8x8_t vneg_s8(int8x8_t a); // VNE//d0,d0 +int16x4_t vneg_s16(int16x4_t a); // VNE//d0,d0 +int32x2_t vneg_s32(int32x2_t a); // VNE//d0,d0 +float32x2_t vneg_f32(float32x2_t a); // VNE//d0,d0 +int8x16_t vnegq_s8(int8x16_t a); // VNE//q0,q0 +int16x8_t vnegq_s16(int16x8_t a); // VNE//q0,q0 +int32x4_t vnegq_s32(int32x4_t a); // VNE//q0,q0 +float32x4_t vnegq_f32(float32x4_t a); // VNE//q0,q0 +//Saturating Negate: sat(Vd[i] = - Va[i]) +int8x8_t vqneg_s8(int8x8_t a); // VQNE//d0,d0 +int16x4_t vqneg_s16(int16x4_t a); // VQNE//d0,d0 +int32x2_t vqneg_s32(int32x2_t a); // VQNE//d0,d0 +int8x16_t vqnegq_s8(int8x16_t a); // VQNE//q0,q0 +int16x8_t vqnegq_s16(int16x8_t a); // VQNE//q0,q0 +int32x4_t vqnegq_s32(int32x4_t a); // VQNE//q0,q0 +//Count leading sign bits +int8x8_t vcls_s8(int8x8_t a); // VCLS.S8 d0,d0 +int16x4_t vcls_s16(int16x4_t a); // VCLS.S16 d0,d0 +int32x2_t vcls_s32(int32x2_t a); // VCLS.S32 d0,d0 +int8x16_t vclsq_s8(int8x16_t a); // VCLS.S8 q0,q0 +int16x8_t vclsq_s16(int16x8_t a); // VCLS.S16 q0,q0 +int32x4_t vclsq_s32(int32x4_t a); // VCLS.S32 q0,q0 +//Count leading zeros +int8x8_t vclz_s8(int8x8_t a); // VCLZ.I8 d0,d0 +int16x4_t vclz_s16(int16x4_t a); // VCLZ.I16 d0,d0 +int32x2_t vclz_s32(int32x2_t a); // VCLZ.I32 d0,d0 +uint8x8_t vclz_u8(uint8x8_t a); // VCLZ.I8 d0,d0 +uint16x4_t vclz_u16(uint16x4_t a); // VCLZ.I16 d0,d0 +uint32x2_t vclz_u32(uint32x2_t a); // VCLZ.I32 d0,d0 +int8x16_t vclzq_s8(int8x16_t a); // VCLZ.I8 q0,q0 +int16x8_t vclzq_s16(int16x8_t a); // VCLZ.I16 q0,q0 +int32x4_t vclzq_s32(int32x4_t a); // VCLZ.I32 q0,q0 +uint8x16_t vclzq_u8(uint8x16_t a); // VCLZ.I8 q0,q0 +uint16x8_t vclzq_u16(uint16x8_t a); // VCLZ.I16 q0,q0 +uint32x4_t vclzq_u32(uint32x4_t a); // VCLZ.I32 q0,q0 +//Count number of set bits +uint8x8_t vcnt_u8(uint8x8_t a); // VCNT.8 d0,d0 +int8x8_t vcnt_s8(int8x8_t a); // VCNT.8 d0,d0 +poly8x8_t vcnt_p8(poly8x8_t a); // VCNT.8 d0,d0 +uint8x16_t vcntq_u8(uint8x16_t a); // VCNT.8 q0,q0 +int8x16_t vcntq_s8(int8x16_t a); // VCNT.8 q0,q0 +poly8x16_t vcntq_p8(poly8x16_t a); // VCNT.8 q0,q0 +//Reciprocal estimate +float32x2_t vrecpe_f32(float32x2_t a); // VRECPE.F32 d0,d0 +uint32x2_t vrecpe_u32(uint32x2_t a); // VRECPE.U32 d0,d0 +float32x4_t vrecpeq_f32(float32x4_t a); // VRECPE.F32 q0,q0 +uint32x4_t vrecpeq_u32(uint32x4_t a); // VRECPE.U32 q0,q0 +//Reciprocal square root estimate +float32x2_t vrsqrte_f32(float32x2_t a); // VRSQRTE.F32 d0,d0 +uint32x2_t vrsqrte_u32(uint32x2_t a); // VRSQRTE.U32 d0,d0 +float32x4_t vrsqrteq_f32(float32x4_t a); // VRSQRTE.F32 q0,q0 +uint32x4_t vrsqrteq_u32(uint32x4_t a); // VRSQRTE.U32 q0,q0 +//Logical operations +//Bitwise not +int8x8_t vmvn_s8(int8x8_t a); // VMVN d0,d0 +int16x4_t vmvn_s16(int16x4_t a); // VMVN d0,d0 +int32x2_t vmvn_s32(int32x2_t a); // VMVN d0,d0 +uint8x8_t vmvn_u8(uint8x8_t a); // VMVN d0,d0 +uint16x4_t vmvn_u16(uint16x4_t a); // VMVN d0,d0 +uint32x2_t vmvn_u32(uint32x2_t a); // VMVN d0,d0 +poly8x8_t vmvn_p8(poly8x8_t a); // VMVN d0,d0 +int8x16_t vmvnq_s8(int8x16_t a); // VMVN q0,q0 +int16x8_t vmvnq_s16(int16x8_t a); // VMVN q0,q0 +int32x4_t vmvnq_s32(int32x4_t a); // VMVN q0,q0 +uint8x16_t vmvnq_u8(uint8x16_t a); // VMVN q0,q0 +uint16x8_t vmvnq_u16(uint16x8_t a); // VMVN q0,q0 +uint32x4_t vmvnq_u32(uint32x4_t a); // VMVN q0,q0 +poly8x16_t vmvnq_p8(poly8x16_t a); // VMVN q0,q0 +//Bitwise and +int8x8_t vand_s8(int8x8_t a, int8x8_t b); // VAND d0,d0,d0 +int16x4_t vand_s16(int16x4_t a, int16x4_t b); // VAND d0,d0,d0 +int32x2_t vand_s32(int32x2_t a, int32x2_t b); // VAND d0,d0,d0 +int64x1_t vand_s64(int64x1_t a, int64x1_t b); // VAND d0,d0,d0 +uint8x8_t vand_u8(uint8x8_t a, uint8x8_t b); // VAND d0,d0,d0 +uint16x4_t vand_u16(uint16x4_t a, uint16x4_t b); // VAND d0,d0,d0 +uint32x2_t vand_u32(uint32x2_t a, uint32x2_t b); // VAND d0,d0,d0 +uint64x1_t vand_u64(uint64x1_t a, uint64x1_t b); // VAND d0,d0,d0 +int8x16_t vandq_s8(int8x16_t a, int8x16_t b); // VAND q0,q0,q0 +int16x8_t vandq_s16(int16x8_t a, int16x8_t b); // VAND q0,q0,q0 +int32x4_t vandq_s32(int32x4_t a, int32x4_t b); // VAND q0,q0,q0 +int64x2_t vandq_s64(int64x2_t a, int64x2_t b); // VAND q0,q0,q0 +uint8x16_t vandq_u8(uint8x16_t a, uint8x16_t b); // VAND q0,q0,q0 +uint16x8_t vandq_u16(uint16x8_t a, uint16x8_t b); // VAND q0,q0,q0 +uint32x4_t vandq_u32(uint32x4_t a, uint32x4_t b); // VAND q0,q0,q0 +uint64x2_t vandq_u64(uint64x2_t a, uint64x2_t b); // VAND q0,q0,q0 +//Bitwise or +int8x8_t vorr_s8(int8x8_t a, int8x8_t b); // VORR d0,d0,d0 +int16x4_t vorr_s16(int16x4_t a, int16x4_t b); // VORR d0,d0,d0 +int32x2_t vorr_s32(int32x2_t a, int32x2_t b); // VORR d0,d0,d0 +int64x1_t vorr_s64(int64x1_t a, int64x1_t b); // VORR d0,d0,d0 +uint8x8_t vorr_u8(uint8x8_t a, uint8x8_t b); // VORR d0,d0,d0 +uint16x4_t vorr_u16(uint16x4_t a, uint16x4_t b); // VORR d0,d0,d0 +uint32x2_t vorr_u32(uint32x2_t a, uint32x2_t b); // VORR d0,d0,d0 +uint64x1_t vorr_u64(uint64x1_t a, uint64x1_t b); // VORR d0,d0,d0 +int8x16_t vorrq_s8(int8x16_t a, int8x16_t b); // VORR q0,q0,q0 +int16x8_t vorrq_s16(int16x8_t a, int16x8_t b); // VORR q0,q0,q0 +int32x4_t vorrq_s32(int32x4_t a, int32x4_t b); // VORR q0,q0,q0 +int64x2_t vorrq_s64(int64x2_t a, int64x2_t b); // VORR q0,q0,q0 +uint8x16_t vorrq_u8(uint8x16_t a, uint8x16_t b); // VORR q0,q0,q0 +uint16x8_t vorrq_u16(uint16x8_t a, uint16x8_t b); // VORR q0,q0,q0 +uint32x4_t vorrq_u32(uint32x4_t a, uint32x4_t b); // VORR q0,q0,q0 +uint64x2_t vorrq_u64(uint64x2_t a, uint64x2_t b); // VORR q0,q0,q0 +//Bitwise exclusive or (EOR or XOR) +int8x8_t veor_s8(int8x8_t a, int8x8_t b); // VEOR d0,d0,d0 +int16x4_t veor_s16(int16x4_t a, int16x4_t b); // VEOR d0,d0,d0 +int32x2_t veor_s32(int32x2_t a, int32x2_t b); // VEOR d0,d0,d0 +int64x1_t veor_s64(int64x1_t a, int64x1_t b); // VEOR d0,d0,d0 +uint8x8_t veor_u8(uint8x8_t a, uint8x8_t b); // VEOR d0,d0,d0 +uint16x4_t veor_u16(uint16x4_t a, uint16x4_t b); // VEOR d0,d0,d0 +uint32x2_t veor_u32(uint32x2_t a, uint32x2_t b); // VEOR d0,d0,d0 +uint64x1_t veor_u64(uint64x1_t a, uint64x1_t b); // VEOR d0,d0,d0 +int8x16_t veorq_s8(int8x16_t a, int8x16_t b); // VEOR q0,q0,q0 +int16x8_t veorq_s16(int16x8_t a, int16x8_t b); // VEOR q0,q0,q0 +int32x4_t veorq_s32(int32x4_t a, int32x4_t b); // VEOR q0,q0,q0 +int64x2_t veorq_s64(int64x2_t a, int64x2_t b); // VEOR q0,q0,q0 +uint8x16_t veorq_u8(uint8x16_t a, uint8x16_t b); // VEOR q0,q0,q0 +uint16x8_t veorq_u16(uint16x8_t a, uint16x8_t b); // VEOR q0,q0,q0 +uint32x4_t veorq_u32(uint32x4_t a, uint32x4_t b); // VEOR q0,q0,q0 +uint64x2_t veorq_u64(uint64x2_t a, uint64x2_t b); // VEOR q0,q0,q0 +//Bit Clear +int8x8_t vbic_s8(int8x8_t a, int8x8_t b); // VBIC d0,d0,d0 +int16x4_t vbic_s16(int16x4_t a, int16x4_t b); // VBIC d0,d0,d0 +int32x2_t vbic_s32(int32x2_t a, int32x2_t b); // VBIC d0,d0,d0 +int64x1_t vbic_s64(int64x1_t a, int64x1_t b); // VBIC d0,d0,d0 +uint8x8_t vbic_u8(uint8x8_t a, uint8x8_t b); // VBIC d0,d0,d0 +uint16x4_t vbic_u16(uint16x4_t a, uint16x4_t b); // VBIC d0,d0,d0 +uint32x2_t vbic_u32(uint32x2_t a, uint32x2_t b); // VBIC d0,d0,d0 +uint64x1_t vbic_u64(uint64x1_t a, uint64x1_t b); // VBIC d0,d0,d0 +int8x16_t vbicq_s8(int8x16_t a, int8x16_t b); // VBIC q0,q0,q0 +int16x8_t vbicq_s16(int16x8_t a, int16x8_t b); // VBIC q0,q0,q0 +int32x4_t vbicq_s32(int32x4_t a, int32x4_t b); // VBIC q0,q0,q0 +int64x2_t vbicq_s64(int64x2_t a, int64x2_t b); // VBIC q0,q0,q0 +uint8x16_t vbicq_u8(uint8x16_t a, uint8x16_t b); // VBIC q0,q0,q0 +uint16x8_t vbicq_u16(uint16x8_t a, uint16x8_t b); // VBIC q0,q0,q0 +uint32x4_t vbicq_u32(uint32x4_t a, uint32x4_t b); // VBIC q0,q0,q0 +uint64x2_t vbicq_u64(uint64x2_t a, uint64x2_t b); // VBIC q0,q0,q0 +//Bitwise OR complement +int8x8_t vorn_s8(int8x8_t a, int8x8_t b); // VORN d0,d0,d0 +int16x4_t vorn_s16(int16x4_t a, int16x4_t b); // VORN d0,d0,d0 +int32x2_t vorn_s32(int32x2_t a, int32x2_t b); // VORN d0,d0,d0 +int64x1_t vorn_s64(int64x1_t a, int64x1_t b); // VORN d0,d0,d0 +uint8x8_t vorn_u8(uint8x8_t a, uint8x8_t b); // VORN d0,d0,d0 +uint16x4_t vorn_u16(uint16x4_t a, uint16x4_t b); // VORN d0,d0,d0 +uint32x2_t vorn_u32(uint32x2_t a, uint32x2_t b); // VORN d0,d0,d0 +uint64x1_t vorn_u64(uint64x1_t a, uint64x1_t b); // VORN d0,d0,d0 +int8x16_t vornq_s8(int8x16_t a, int8x16_t b); // VORN q0,q0,q0 +int16x8_t vornq_s16(int16x8_t a, int16x8_t b); // VORN q0,q0,q0 +int32x4_t vornq_s32(int32x4_t a, int32x4_t b); // VORN q0,q0,q0 +int64x2_t vornq_s64(int64x2_t a, int64x2_t b); // VORN q0,q0,q0 +uint8x16_t vornq_u8(uint8x16_t a, uint8x16_t b); // VORN q0,q0,q0 +uint16x8_t vornq_u16(uint16x8_t a, uint16x8_t b); // VORN q0,q0,q0 +uint32x4_t vornq_u32(uint32x4_t a, uint32x4_t b); // VORN q0,q0,q0 +uint64x2_t vornq_u64(uint64x2_t a, uint64x2_t b); // VORN q0,q0,q0 +//Bitwise Select +int8x8_t vbsl_s8(uint8x8_t a, int8x8_t b, int8x8_t c); // VBSL d0,d0,d0 +int16x4_t vbsl_s16(uint16x4_t a, int16x4_t b, int16x4_t c); // VBSL d0,d0,d0 +int32x2_t vbsl_s32(uint32x2_t a, int32x2_t b, int32x2_t c); // VBSL d0,d0,d0 +int64x1_t vbsl_s64(uint64x1_t a, int64x1_t b, int64x1_t c); // VBSL d0,d0,d0 +uint8x8_t vbsl_u8(uint8x8_t a, uint8x8_t b, uint8x8_t c); // VBSL d0,d0,d0 +uint16x4_t vbsl_u16(uint16x4_t a, uint16x4_t b, uint16x4_t c); // VBSL d0,d0,d0 +uint32x2_t vbsl_u32(uint32x2_t a, uint32x2_t b, uint32x2_t c); // VBSL d0,d0,d0 +uint64x1_t vbsl_u64(uint64x1_t a, uint64x1_t b, uint64x1_t c); // VBSL d0,d0,d0 +float32x2_t vbsl_f32(uint32x2_t a, float32x2_t b, float32x2_t c); // VBSL d0,d0,d0 +poly8x8_t vbsl_p8(uint8x8_t a, poly8x8_t b, poly8x8_t c); // VBSL d0,d0,d0 +poly16x4_t vbsl_p16(uint16x4_t a, poly16x4_t b, poly16x4_t c); // VBSL d0,d0,d0 +int8x16_t vbslq_s8(uint8x16_t a, int8x16_t b, int8x16_t c); // VBSL q0,q0,q0 +int16x8_t vbslq_s16(uint16x8_t a, int16x8_t b, int16x8_t c); // VBSL q0,q0,q0 +int32x4_t vbslq_s32(uint32x4_t a, int32x4_t b, int32x4_t c); // VBSL q0,q0,q0 +int64x2_t vbslq_s64(uint64x2_t a, int64x2_t b, int64x2_t c); // VBSL q0,q0,q0 +uint8x16_t vbslq_u8(uint8x16_t a, uint8x16_t b, uint8x16_t c); // VBSL q0,q0,q0 +uint16x8_t vbslq_u16(uint16x8_t a, uint16x8_t b, uint16x8_t c); // VBSL q0,q0,q0 +uint32x4_t vbslq_u32(uint32x4_t a, uint32x4_t b, uint32x4_t c); // VBSL q0,q0,q0 +uint64x2_t vbslq_u64(uint64x2_t a, uint64x2_t b, uint64x2_t c); // VBSL q0,q0,q0 +float32x4_t vbslq_f32(uint32x4_t a, float32x4_t b, float32x4_t c); // VBSL q0,q0,q0 +poly8x16_t vbslq_p8(uint8x16_t a, poly8x16_t b, poly8x16_t c); // VBSL q0,q0,q0 +poly16x8_t vbslq_p16(uint16x8_t a, poly16x8_t b, poly16x8_t c); // VBSL q0,q0,q0 +//Transposition operations +//Transpose elements +int8x8x2_t vtrn_s8(int8x8_t a, int8x8_t b); // VTRN.8 d0,d0 +int16x4x2_t vtrn_s16(int16x4_t a, int16x4_t b); // VTRN.16 d0,d0 +int32x2x2_t vtrn_s32(int32x2_t a, int32x2_t b); // VTRN.32 d0,d0 +uint8x8x2_t vtrn_u8(uint8x8_t a, uint8x8_t b); // VTRN.8 d0,d0 +uint16x4x2_t vtrn_u16(uint16x4_t a, uint16x4_t b); // VTRN.16 d0,d0 +uint32x2x2_t vtrn_u32(uint32x2_t a, uint32x2_t b); // VTRN.32 d0,d0 +float32x2x2_t vtrn_f32(float32x2_t a, float32x2_t b); // VTRN.32 d0,d0 +poly8x8x2_t vtrn_p8(poly8x8_t a, poly8x8_t b); // VTRN.8 d0,d0 +poly16x4x2_t vtrn_p16(poly16x4_t a, poly16x4_t b); // VTRN.16 d0,d0 +int8x16x2_t vtrnq_s8(int8x16_t a, int8x16_t b); // VTRN.8 q0,q0 +int16x8x2_t vtrnq_s16(int16x8_t a, int16x8_t b); // VTRN.16 q0,q0 +int32x4x2_t vtrnq_s32(int32x4_t a, int32x4_t b); // VTRN.32 q0,q0 +uint8x16x2_t vtrnq_u8(uint8x16_t a, uint8x16_t b); // VTRN.8 q0,q0 +uint16x8x2_t vtrnq_u16(uint16x8_t a, uint16x8_t b); // VTRN.16 q0,q0 +uint32x4x2_t vtrnq_u32(uint32x4_t a, uint32x4_t b); // VTRN.32 q0,q0 +float32x4x2_t vtrnq_f32(float32x4_t a, float32x4_t b); // VTRN.32 q0,q0 +poly8x16x2_t vtrnq_p8(poly8x16_t a, poly8x16_t b); // VTRN.8 q0,q0 +poly16x8x2_t vtrnq_p16(poly16x8_t a, poly16x8_t b); // VTRN.16 q0,q0 +//Interleave elements +int8x8x2_t vzip_s8(int8x8_t a, int8x8_t b); // VZIP.8 d0,d0 +int16x4x2_t vzip_s16(int16x4_t a, int16x4_t b); // VZIP.16 d0,d0 +int32x2x2_t vzip_s32(int32x2_t a, int32x2_t b); // VZIP.32 d0,d0 +uint8x8x2_t vzip_u8(uint8x8_t a, uint8x8_t b); // VZIP.8 d0,d0 +uint16x4x2_t vzip_u16(uint16x4_t a, uint16x4_t b); // VZIP.16 d0,d0 +uint32x2x2_t vzip_u32(uint32x2_t a, uint32x2_t b); // VZIP.32 d0,d0 +float32x2x2_t vzip_f32(float32x2_t a, float32x2_t b); // VZIP.32 d0,d0 +poly8x8x2_t vzip_p8(poly8x8_t a, poly8x8_t b); // VZIP.8 d0,d0 +poly16x4x2_t vzip_p16(poly16x4_t a, poly16x4_t b); // VZIP.16 d0,d0 +int8x16x2_t vzipq_s8(int8x16_t a, int8x16_t b); // VZIP.8 q0,q0 +int16x8x2_t vzipq_s16(int16x8_t a, int16x8_t b); // VZIP.16 q0,q0 +int32x4x2_t vzipq_s32(int32x4_t a, int32x4_t b); // VZIP.32 q0,q0 +uint8x16x2_t vzipq_u8(uint8x16_t a, uint8x16_t b); // VZIP.8 q0,q0 +uint16x8x2_t vzipq_u16(uint16x8_t a, uint16x8_t b); // VZIP.16 q0,q0 +uint32x4x2_t vzipq_u32(uint32x4_t a, uint32x4_t b); // VZIP.32 q0,q0 +float32x4x2_t vzipq_f32(float32x4_t a, float32x4_t b); // VZIP.32 q0,q0 +poly8x16x2_t vzipq_p8(poly8x16_t a, poly8x16_t b); // VZIP.8 q0,q0 +poly16x8x2_t vzipq_p16(poly16x8_t a, poly16x8_t b); // VZIP.16 q0,q0 +//De-Interleave elements +int8x8x2_t vuzp_s8(int8x8_t a, int8x8_t b); // VUZP.8 d0,d0 +int16x4x2_t vuzp_s16(int16x4_t a, int16x4_t b); // VUZP.16 d0,d0 +int32x2x2_t vuzp_s32(int32x2_t a, int32x2_t b); // VUZP.32 d0,d0 +uint8x8x2_t vuzp_u8(uint8x8_t a, uint8x8_t b); // VUZP.8 d0,d0 +uint16x4x2_t vuzp_u16(uint16x4_t a, uint16x4_t b); // VUZP.16 d0,d0 +uint32x2x2_t vuzp_u32(uint32x2_t a, uint32x2_t b); // VUZP.32 d0,d0 +float32x2x2_t vuzp_f32(float32x2_t a, float32x2_t b); // VUZP.32 d0,d0 +poly8x8x2_t vuzp_p8(poly8x8_t a, poly8x8_t b); // VUZP.8 d0,d0 +poly16x4x2_t vuzp_p16(poly16x4_t a, poly16x4_t b); // VUZP.16 d0,d0 +int8x16x2_t vuzpq_s8(int8x16_t a, int8x16_t b); // VUZP.8 q0,q0 +int16x8x2_t vuzpq_s16(int16x8_t a, int16x8_t b); // VUZP.16 q0,q0 +int32x4x2_t vuzpq_s32(int32x4_t a, int32x4_t b); // VUZP.32 q0,q0 +uint8x16x2_t vuzpq_u8(uint8x16_t a, uint8x16_t b); // VUZP.8 q0,q0 +uint16x8x2_t vuzpq_u16(uint16x8_t a, uint16x8_t b); // VUZP.16 q0,q0 +uint32x4x2_t vuzpq_u32(uint32x4_t a, uint32x4_t b); // VUZP.32 q0,q0 +float32x4x2_t vuzpq_f32(float32x4_t a, float32x4_t b); // VUZP.32 q0,q0 +poly8x16x2_t vuzpq_p8(poly8x16_t a, poly8x16_t b); // VUZP.8 q0,q0 +poly16x8x2_t vuzpq_p16(poly16x8_t a, poly16x8_t b); // VUZP.16 q0,q0 + + +//^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +// the following macros solve the problem of the "immediate parameters requirement" for some x86 intrinsics. While for release build it is not a must, +//for debug build we need it to compile the code unless the "Intrinsic parameter must be an immediate value" error is our goal +// +#if ( ((defined _MSC_VER) && (_MSC_VER > 1600)) || defined (__INTEL_COMPILER) )&& defined NDEBUG //if it is a release build, we also need it to fix the issue for VS2010 and earlier compilers. + + #define _MM_ALIGNR_EPI8 _mm_alignr_epi8 + + #define _MM_EXTRACT_EPI16 _mm_extract_epi16 + #define _MM_INSERT_EPI16 _mm_insert_epi16 +#ifdef USE_SSE4 + #define _MM_EXTRACT_EPI8 _mm_extract_epi8 + #define _MM_EXTRACT_EPI32 _mm_extract_epi32 + #define _MM_EXTRACT_PS _mm_extract_ps + + #define _MM_INSERT_EPI8 _mm_insert_epi8 + #define _MM_INSERT_EPI32 _mm_insert_epi32 + #define _MM_INSERT_PS _mm_insert_ps +#ifdef _NEON2SSE_64BIT + #define _MM_INSERT_EPI64 _mm_insert_epi64 + #define _MM_EXTRACT_EPI64 _mm_extract_epi64 +#endif +#endif //SSE4 +#else + #define _NEON2SSE_COMMA , + #define _NEON2SSE_SWITCH16(NAME, a, b, LANE) \ + switch(LANE) \ + { \ + case 0: return NAME(a b, 0); \ + case 1: return NAME(a b, 1); \ + case 2: return NAME(a b, 2); \ + case 3: return NAME(a b, 3); \ + case 4: return NAME(a b, 4); \ + case 5: return NAME(a b, 5); \ + case 6: return NAME(a b, 6); \ + case 7: return NAME(a b, 7); \ + case 8: return NAME(a b, 8); \ + case 9: return NAME(a b, 9); \ + case 10: return NAME(a b, 10); \ + case 11: return NAME(a b, 11); \ + case 12: return NAME(a b, 12); \ + case 13: return NAME(a b, 13); \ + case 14: return NAME(a b, 14); \ + case 15: return NAME(a b, 15); \ + default: return NAME(a b, 0); \ + } + + #define _NEON2SSE_SWITCH8(NAME, vec, LANE, p) \ + switch(LANE) \ + { \ + case 0: return NAME(vec p,0); \ + case 1: return NAME(vec p,1); \ + case 2: return NAME(vec p,2); \ + case 3: return NAME(vec p,3); \ + case 4: return NAME(vec p,4); \ + case 5: return NAME(vec p,5); \ + case 6: return NAME(vec p,6); \ + case 7: return NAME(vec p,7); \ + default: return NAME(vec p,0); \ + } + + #define _NEON2SSE_SWITCH4(NAME, case0, case1, case2, case3, vec, LANE, p) \ + switch(LANE) \ + { \ + case case0: return NAME(vec p,case0); \ + case case1: return NAME(vec p,case1); \ + case case2: return NAME(vec p,case2); \ + case case3: return NAME(vec p,case3); \ + default: return NAME(vec p,case0); \ + } + + _NEON2SSE_INLINE __m128i _MM_ALIGNR_EPI8(__m128i a, __m128i b, int LANE) + { + _NEON2SSE_SWITCH16(_mm_alignr_epi8, a, _NEON2SSE_COMMA b, LANE) + } + + _NEON2SSE_INLINE __m128i _MM_INSERT_EPI16(__m128i vec, int p, const int LANE) + { + _NEON2SSE_SWITCH8(_mm_insert_epi16, vec, LANE, _NEON2SSE_COMMA p) + } + + _NEON2SSE_INLINE int _MM_EXTRACT_EPI16(__m128i vec, const int LANE) + { + _NEON2SSE_SWITCH8(_mm_extract_epi16, vec, LANE,) + } + +#ifdef USE_SSE4 + _NEON2SSE_INLINE int _MM_EXTRACT_EPI32(__m128i vec, const int LANE) + { + _NEON2SSE_SWITCH4(_mm_extract_epi32, 0,1,2,3, vec, LANE,) + } + + _NEON2SSE_INLINE int _MM_EXTRACT_PS(__m128 vec, const int LANE) + { + _NEON2SSE_SWITCH4(_mm_extract_ps, 0,1,2,3, vec, LANE,) + } + + _NEON2SSE_INLINE int _MM_EXTRACT_EPI8(__m128i vec, const int LANE) + { + _NEON2SSE_SWITCH16(_mm_extract_epi8, vec, , LANE) + } + + _NEON2SSE_INLINE __m128i _MM_INSERT_EPI32(__m128i vec, int p, const int LANE) + { + _NEON2SSE_SWITCH4(_mm_insert_epi32, 0, 1, 2, 3, vec, LANE, _NEON2SSE_COMMA p) + } + + _NEON2SSE_INLINE __m128i _MM_INSERT_EPI8(__m128i vec, int p, const int LANE) + { + _NEON2SSE_SWITCH16(_mm_insert_epi8, vec, _NEON2SSE_COMMA p, LANE) + } + +#ifdef _NEON2SSE_64BIT + //the special case of functions available only for SSE4 and 64-bit build. + _NEON2SSE_INLINE __m128i _MM_INSERT_EPI64(__m128i vec, int p, const int LANE) + { + switch(LANE) { + case 0: + return _mm_insert_epi64(vec, p, 0); + case 1: + return _mm_insert_epi64(vec, p, 1); + default: + return _mm_insert_epi64(vec, p, 0); + } + } + + _NEON2SSE_INLINE int64_t _MM_EXTRACT_EPI64(__m128i val, const int LANE) + { + if (LANE ==0) return _mm_extract_epi64(val, 0); + else return _mm_extract_epi64(val, 1); + } +#endif + + _NEON2SSE_INLINE __m128 _MM_INSERT_PS(__m128 vec, __m128 p, const int LANE) + { + _NEON2SSE_SWITCH4(_mm_insert_ps, 0, 16, 32, 48, vec, LANE, _NEON2SSE_COMMA p) + } + +#endif //USE_SSE4 + +#endif //#ifdef NDEBUG + +//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +// Below are some helper functions used either for SSE4 intrinsics "emulation" for SSSE3 limited devices +// or for some specific commonly used operations implementation missing in SSE +#ifdef USE_SSE4 + #define _MM_CVTEPU8_EPI16 _mm_cvtepu8_epi16 + #define _MM_CVTEPU16_EPI32 _mm_cvtepu16_epi32 + #define _MM_CVTEPU32_EPI64 _mm_cvtepu32_epi64 + + #define _MM_CVTEPI8_EPI16 _mm_cvtepi8_epi16 + #define _MM_CVTEPI16_EPI32 _mm_cvtepi16_epi32 + #define _MM_CVTEPI32_EPI64 _mm_cvtepi32_epi64 + + #define _MM_MAX_EPI8 _mm_max_epi8 + #define _MM_MAX_EPI32 _mm_max_epi32 + #define _MM_MAX_EPU16 _mm_max_epu16 + #define _MM_MAX_EPU32 _mm_max_epu32 + + #define _MM_MIN_EPI8 _mm_min_epi8 + #define _MM_MIN_EPI32 _mm_min_epi32 + #define _MM_MIN_EPU16 _mm_min_epu16 + #define _MM_MIN_EPU32 _mm_min_epu32 + + #define _MM_BLENDV_EPI8 _mm_blendv_epi8 + #define _MM_PACKUS_EPI32 _mm_packus_epi32 + #define _MM_PACKUS1_EPI32(a) _mm_packus_epi32(a, a) + + #define _MM_MULLO_EPI32 _mm_mullo_epi32 + #define _MM_MUL_EPI32 _mm_mul_epi32 + + #define _MM_CMPEQ_EPI64 _mm_cmpeq_epi64 +#else //no SSE4 !!!!!! + _NEON2SSE_INLINE __m128i _MM_CVTEPU8_EPI16(__m128i a) + { + __m128i zero = _mm_setzero_si128(); + return _mm_unpacklo_epi8(a, zero); + } + + _NEON2SSE_INLINE __m128i _MM_CVTEPU16_EPI32(__m128i a) + { + __m128i zero = _mm_setzero_si128(); + return _mm_unpacklo_epi16(a, zero); + } + + _NEON2SSE_INLINE __m128i _MM_CVTEPU32_EPI64(__m128i a) + { + __m128i zero = _mm_setzero_si128(); + return _mm_unpacklo_epi32(a, zero); + } + + _NEON2SSE_INLINE __m128i _MM_CVTEPI8_EPI16(__m128i a) + { + __m128i zero = _mm_setzero_si128(); + __m128i sign = _mm_cmpgt_epi8(zero, a); + return _mm_unpacklo_epi8(a, sign); + } + + _NEON2SSE_INLINE __m128i _MM_CVTEPI16_EPI32(__m128i a) + { + __m128i zero = _mm_setzero_si128(); + __m128i sign = _mm_cmpgt_epi16(zero, a); + return _mm_unpacklo_epi16(a, sign); + } + + _NEON2SSE_INLINE __m128i _MM_CVTEPI32_EPI64(__m128i a) + { + __m128i zero = _mm_setzero_si128(); + __m128i sign = _mm_cmpgt_epi32(zero, a); + return _mm_unpacklo_epi32(a, sign); + } + + _NEON2SSE_INLINE int _MM_EXTRACT_EPI32(__m128i vec, const int LANE) + { + _NEON2SSE_ALIGN_16 int32_t tmp[4]; + _mm_store_si128((__m128i*)tmp, vec); + return tmp[LANE]; + } + + _NEON2SSE_INLINE int _MM_EXTRACT_EPI8(__m128i vec, const int LANE) + { + _NEON2SSE_ALIGN_16 int8_t tmp[16]; + _mm_store_si128((__m128i*)tmp, vec); + return (int)tmp[LANE]; + } + + _NEON2SSE_INLINE int _MM_EXTRACT_PS(__m128 vec, const int LANE) + { + _NEON2SSE_ALIGN_16 int32_t tmp[4]; + _mm_store_si128((__m128i*)tmp, _M128i(vec)); + return tmp[LANE]; + } + + _NEON2SSE_INLINE __m128i _MM_INSERT_EPI32(__m128i vec, int p, const int LANE) + { + _NEON2SSE_ALIGN_16 int32_t pvec[4] = {0,0,0,0}; + _NEON2SSE_ALIGN_16 uint32_t mask[4] = {0xffffffff,0xffffffff,0xffffffff,0xffffffff}; + __m128i vec_masked, p_masked; + pvec[LANE] = p; + mask[LANE] = 0x0; + vec_masked = _mm_and_si128 (*(__m128i*)mask,vec); //ready for p + p_masked = _mm_andnot_si128 (*(__m128i*)mask,*(__m128i*)pvec); //ready for vec + return _mm_or_si128(vec_masked, p_masked); + } + + _NEON2SSE_INLINE __m128i _MM_INSERT_EPI8(__m128i vec, int p, const int LANE) + { + _NEON2SSE_ALIGN_16 int8_t pvec[16] = {0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0}; + _NEON2SSE_ALIGN_16 uint8_t mask[16] = {0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff,0xff}; + __m128i vec_masked, p_masked; + pvec[LANE] = (int8_t)p; + mask[LANE] = 0x0; + vec_masked = _mm_and_si128 (*(__m128i*)mask,vec); //ready for p + p_masked = _mm_andnot_si128 (*(__m128i*)mask,*(__m128i*)pvec); //ready for vec + return _mm_or_si128(vec_masked, p_masked); + } + + _NEON2SSE_INLINE __m128 _MM_INSERT_PS(__m128 vec, __m128 p, const int LANE) + { + _NEON2SSE_ALIGN_16 int32_t mask[4] = {0xffffffff,0xffffffff,0xffffffff,0xffffffff}; + __m128 tmp, vec_masked, p_masked; + mask[LANE >> 4] = 0x0; //here the LANE is not actural lane, need to deal with it + vec_masked = _mm_and_ps (*(__m128*)mask,vec); //ready for p + p_masked = _mm_andnot_ps (*(__m128*)mask, p); //ready for vec + tmp = _mm_or_ps(vec_masked, p_masked); + return tmp; + } + + _NEON2SSE_INLINE __m128i _MM_MAX_EPI8(__m128i a, __m128i b) + { + __m128i cmp, resa, resb; + cmp = _mm_cmpgt_epi8 (a, b); + resa = _mm_and_si128 (cmp, a); + resb = _mm_andnot_si128 (cmp,b); + return _mm_or_si128(resa, resb); + } + + _NEON2SSE_INLINE __m128i _MM_MAX_EPI32(__m128i a, __m128i b) + { + __m128i cmp, resa, resb; + cmp = _mm_cmpgt_epi32(a, b); + resa = _mm_and_si128 (cmp, a); + resb = _mm_andnot_si128 (cmp,b); + return _mm_or_si128(resa, resb); + } + + _NEON2SSE_INLINE __m128i _MM_MAX_EPU16(__m128i a, __m128i b) + { + __m128i c8000, b_s, a_s, cmp; + c8000 = _mm_cmpeq_epi16 (a,a); //0xffff + c8000 = _mm_slli_epi16 (c8000, 15); //0x8000 + b_s = _mm_sub_epi16 (b, c8000); + a_s = _mm_sub_epi16 (a, c8000); + cmp = _mm_cmpgt_epi16 (a_s, b_s); //no unsigned comparison, need to go to signed + a_s = _mm_and_si128 (cmp,a); + b_s = _mm_andnot_si128 (cmp,b); + return _mm_or_si128(a_s, b_s); + } + + _NEON2SSE_INLINE __m128i _MM_MAX_EPU32(__m128i a, __m128i b) + { + __m128i c80000000, b_s, a_s, cmp; + c80000000 = _mm_cmpeq_epi32 (a,a); //0xffffffff + c80000000 = _mm_slli_epi32 (c80000000, 31); //0x80000000 + b_s = _mm_sub_epi32 (b, c80000000); + a_s = _mm_sub_epi32 (a, c80000000); + cmp = _mm_cmpgt_epi32 (a_s, b_s); //no unsigned comparison, need to go to signed + a_s = _mm_and_si128 (cmp,a); + b_s = _mm_andnot_si128 (cmp,b); + return _mm_or_si128(a_s, b_s); + } + + _NEON2SSE_INLINE __m128i _MM_MIN_EPI8(__m128i a, __m128i b) + { + __m128i cmp, resa, resb; + cmp = _mm_cmpgt_epi8 (b, a); + resa = _mm_and_si128 (cmp, a); + resb = _mm_andnot_si128 (cmp,b); + return _mm_or_si128(resa, resb); + } + + _NEON2SSE_INLINE __m128i _MM_MIN_EPI32(__m128i a, __m128i b) + { + __m128i cmp, resa, resb; + cmp = _mm_cmpgt_epi32(b, a); + resa = _mm_and_si128 (cmp, a); + resb = _mm_andnot_si128 (cmp,b); + return _mm_or_si128(resa, resb); + } + + _NEON2SSE_INLINE __m128i _MM_MIN_EPU16(__m128i a, __m128i b) + { + __m128i c8000, b_s, a_s, cmp; + c8000 = _mm_cmpeq_epi16 (a,a); //0xffff + c8000 = _mm_slli_epi16 (c8000, 15); //0x8000 + b_s = _mm_sub_epi16 (b, c8000); + a_s = _mm_sub_epi16 (a, c8000); + cmp = _mm_cmpgt_epi16 (b_s, a_s); //no unsigned comparison, need to go to signed + a_s = _mm_and_si128 (cmp,a); + b_s = _mm_andnot_si128 (cmp,b); + return _mm_or_si128(a_s, b_s); + } + + _NEON2SSE_INLINE __m128i _MM_MIN_EPU32(__m128i a, __m128i b) + { + __m128i c80000000, b_s, a_s, cmp; + c80000000 = _mm_cmpeq_epi32 (a,a); //0xffffffff + c80000000 = _mm_slli_epi32 (c80000000, 31); //0x80000000 + b_s = _mm_sub_epi32 (b, c80000000); + a_s = _mm_sub_epi32 (a, c80000000); + cmp = _mm_cmpgt_epi32 (b_s, a_s); //no unsigned comparison, need to go to signed + a_s = _mm_and_si128 (cmp,a); + b_s = _mm_andnot_si128 (cmp,b); + return _mm_or_si128(a_s, b_s); + } + + _NEON2SSE_INLINE __m128i _MM_BLENDV_EPI8(__m128i a, __m128i b, __m128i mask) //this is NOT exact implementation of _mm_blendv_epi8 !!!!! - please see below + { + //it assumes mask is either 0xff or 0 always (like in all usecases below) while for the original _mm_blendv_epi8 only MSB mask byte matters. + __m128i a_masked, b_masked; + b_masked = _mm_and_si128 (mask,b); //use b if mask 0xff + a_masked = _mm_andnot_si128 (mask,a); + return _mm_or_si128(a_masked, b_masked); + } + + _NEON2SSE_INLINE __m128i _MM_PACKUS_EPI32(__m128i a, __m128i b) + { + _NEON2SSE_ALIGN_16 int8_t mask8_32_even_odd[16] = { 0,1, 4,5, 8,9, 12,13, 2,3, 6,7,10,11,14,15}; + __m128i a16, b16, res, reshi,cmp, zero; + zero = _mm_setzero_si128(); + a16 = _mm_shuffle_epi8 (a, *(__m128i*) mask8_32_even_odd); + b16 = _mm_shuffle_epi8 (b, *(__m128i*) mask8_32_even_odd); + res = _mm_unpacklo_epi64(a16, b16); //result without saturation + reshi = _mm_unpackhi_epi64(a16, b16); //hi part of result used for saturation + cmp = _mm_cmpgt_epi16(zero, reshi); //if cmp<0 the result should be zero + res = _mm_andnot_si128(cmp,res); //if cmp zero - do nothing, otherwise cmp <0 and the result is 0 + cmp = _mm_cmpgt_epi16(reshi,zero); //if cmp positive + return _mm_or_si128(res, cmp); //if cmp positive we are out of 16bits need to saturaate to 0xffff + } + + _NEON2SSE_INLINE __m128i _MM_PACKUS1_EPI32(__m128i a) + { + _NEON2SSE_ALIGN_16 int8_t mask8_32_even_odd[16] = { 0,1, 4,5, 8,9, 12,13, 2,3, 6,7,10,11,14,15}; + __m128i a16, res, reshi,cmp, zero; + zero = _mm_setzero_si128(); + a16 = _mm_shuffle_epi8 (a, *(__m128i*)mask8_32_even_odd); + reshi = _mm_unpackhi_epi64(a16, a16); //hi part of result used for saturation + cmp = _mm_cmpgt_epi16(zero, reshi); //if cmp<0 the result should be zero + res = _mm_andnot_si128(cmp, a16); //if cmp zero - do nothing, otherwise cmp <0 and the result is 0 + cmp = _mm_cmpgt_epi16(reshi,zero); //if cmp positive + return _mm_or_si128(res, cmp); //if cmp positive we are out of 16bits need to saturaate to 0xffff + } + + + _NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(__m128i _MM_MULLO_EPI32(__m128i a, __m128i b), _NEON2SSE_REASON_SLOW_SERIAL) + { + _NEON2SSE_ALIGN_16 int32_t atmp[4], btmp[4], res[4]; + int64_t res64; + int i; + _mm_store_si128((__m128i*)atmp, a); + _mm_store_si128((__m128i*)btmp, b); + for (i = 0; i<4; i++) { + res64 = atmp[i] * btmp[i]; + res[i] = (int)(res64 & 0xffffffff); + } + return _mm_load_si128((__m128i*)res); + } + + _NEON2SSE_INLINE __m128i _MM_MUL_EPI32(__m128i a, __m128i b) + { + __m128i sign, zero, mul_us, a_neg, b_neg, mul_us_neg; + sign = _mm_xor_si128 (a, b); + sign = _mm_srai_epi32 (sign, 31); //promote sign bit to all fields, all fff if negative and all 0 if positive + zero = _mm_setzero_si128(); + a_neg = _mm_abs_epi32 (a); //negate a and b + b_neg = _mm_abs_epi32 (b); //negate a and b + mul_us = _mm_mul_epu32 (a_neg, b_neg); //uses 0 and 2nd data lanes, (abs), the multiplication gives 64 bit result + mul_us_neg = _mm_sub_epi64(zero, mul_us); + mul_us_neg = _mm_and_si128(sign, mul_us_neg); + mul_us = _mm_andnot_si128(sign, mul_us); + return _mm_or_si128 (mul_us, mul_us_neg); + } + + _NEON2SSE_INLINE __m128i _MM_CMPEQ_EPI64(__m128i a, __m128i b) + { + __m128i res; + res = _mm_cmpeq_epi32 (a, b); + return _mm_shuffle_epi32 (res, 1 | (1 << 2) | (3 << 4) | (3 << 6)); //copy the information from hi to low part of the 64 bit data + } +#endif //SSE4 + +//the special case of functions working only for 32 bits, no SSE4 +_NEON2SSE_INLINE __m128i _MM_INSERT_EPI64_32(__m128i vec, int p, const int LANE) +{ + _NEON2SSE_ALIGN_16 uint64_t pvec[2] = {0,0}; + _NEON2SSE_ALIGN_16 uint64_t mask[2] = {0xffffffffffffffff, 0xffffffffffffffff}; + __m128i vec_masked, p_masked; + pvec[LANE] = p; + mask[LANE] = 0x0; + vec_masked = _mm_and_si128 (*(__m128i*)mask,vec); //ready for p + p_masked = _mm_andnot_si128 (*(__m128i*)mask,*(__m128i*)pvec); //ready for vec + return _mm_or_si128(vec_masked, p_masked); +} + +_NEON2SSE_INLINE int64_t _MM_EXTRACT_EPI64_32(__m128i val, const int LANE) +{ + _NEON2SSE_ALIGN_16 int64_t tmp[2]; + _mm_store_si128((__m128i*)tmp, val); + return tmp[LANE]; +} + +#ifndef _NEON2SSE_64BIT_SSE4 + #define _MM_INSERT_EPI64 _MM_INSERT_EPI64_32 + #define _MM_EXTRACT_EPI64 _MM_EXTRACT_EPI64_32 +#endif + +int32x4_t vqd_s32(int32x4_t a); //Doubling saturation for signed ints +_NEON2SSE_INLINE int32x4_t vqd_s32(int32x4_t a) +{ + //Overflow happens only if a and sum have the opposite signs + __m128i c7fffffff, res, res_sat, res_xor_a; + c7fffffff = _mm_set1_epi32(0x7fffffff); + res = _mm_slli_epi32 (a, 1); // res = a*2 + res_sat = _mm_srli_epi32(a, 31); + res_sat = _mm_add_epi32(res_sat, c7fffffff); + res_xor_a = _mm_xor_si128(res, a); + res_xor_a = _mm_srai_epi32(res_xor_a,31); //propagate the sigh bit, all ffff if <0 all ones otherwise + res_sat = _mm_and_si128(res_xor_a, res_sat); + res = _mm_andnot_si128(res_xor_a, res); + return _mm_or_si128(res, res_sat); +} + + +//!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +//************************************************************************* +//************************************************************************* +//***************** Functions redefinition\implementatin starts here ***** +//************************************************************************* +//************************************************************************* +//!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! + +/*If the unified intrinsics solutions is necessary please define your SSE intrinsics wrap here like in the following sample: +#ifdef ARM +#define vector_addq_s32 _mm_add_epi32 +#else //if we have IA +#define vector_addq_s32 vadd_s32 +#endif + +******************************************************************************************** +Functions below are organised in the following way: + +Each NEON intrinsic function has one of the following options: +1. its x86 full equivalent SSE intrinsic - in this case x86 version just follows the NEON one under the corresponding #define statement +2. x86 implementation using more than one x86 intrinsics. In this case it is shaped as inlined C function with return statement +3. the reference to the NEON function returning the same result and implemented in x86 as above. In this case it is shaped as matching NEON function definition +4. for about 5% of functions due to the corresponding x86 SIMD unavailability or inefficiency in terms of performance +the serial implementation is provided along with the corresponding compiler warning. If these functions are on your app critical path +- please consider such functions removal from your code. +*/ + +//*********************************************************************** +//************************ Vector add ***************************** +//*********************************************************************** +int8x8_t vadd_s8(int8x8_t a, int8x8_t b); // VADD.I8 d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vadd_s8(int8x8_t a, int8x8_t b) +{ + int8x8_t res64; + return64(_mm_add_epi8(_pM128i(a),_pM128i(b))); +} + + +int16x4_t vadd_s16(int16x4_t a, int16x4_t b); // VADD.I16 d0,d0,d0 +_NEON2SSE_INLINE int16x4_t vadd_s16(int16x4_t a, int16x4_t b) +{ + int16x4_t res64; + return64(_mm_add_epi16(_pM128i(a),_pM128i(b))); +} + + +int32x2_t vadd_s32(int32x2_t a, int32x2_t b); // VADD.I32 d0,d0,d0 +_NEON2SSE_INLINE int32x2_t vadd_s32(int32x2_t a, int32x2_t b) +{ + int32x2_t res64; + return64(_mm_add_epi32(_pM128i(a),_pM128i(b))); +} + + +int64x1_t vadd_s64(int64x1_t a, int64x1_t b); // VADD.I64 d0,d0,d0 +_NEON2SSE_INLINE int64x1_t vadd_s64(int64x1_t a, int64x1_t b) +{ + int64x1_t res64; + res64.m64_i64[0] = a.m64_i64[0] + b.m64_i64[0]; + return res64; +} + + +float32x2_t vadd_f32(float32x2_t a, float32x2_t b); // VADD.F32 d0,d0,d0 +_NEON2SSE_INLINE float32x2_t vadd_f32(float32x2_t a, float32x2_t b) +{ + __m128 res; + __m64_128 res64; + res = _mm_add_ps(_pM128(a),_pM128(b)); //SSE, use only low 64 bits + _M64f(res64, res); + return res64; +} + +uint8x8_t vadd_u8(uint8x8_t a, uint8x8_t b); // VADD.I8 d0,d0,d0 +#define vadd_u8 vadd_s8 + +uint16x4_t vadd_u16(uint16x4_t a, uint16x4_t b); // VADD.I16 d0,d0,d0 +#define vadd_u16 vadd_s16 + +uint32x2_t vadd_u32(uint32x2_t a, uint32x2_t b); // VADD.I32 d0,d0,d0 +#define vadd_u32 vadd_s32 + +uint64x1_t vadd_u64(uint64x1_t a, uint64x1_t b); // VADD.I64 d0,d0,d0 +_NEON2SSE_INLINE uint64x1_t vadd_u64(uint64x1_t a, uint64x1_t b) +{ + uint64x1_t res64; + res64.m64_u64[0] = a.m64_u64[0] + b.m64_u64[0]; + return res64; +} + + +int8x16_t vaddq_s8(int8x16_t a, int8x16_t b); // VADD.I8 q0,q0,q0 +#define vaddq_s8 _mm_add_epi8 + +int16x8_t vaddq_s16(int16x8_t a, int16x8_t b); // VADD.I16 q0,q0,q0 +#define vaddq_s16 _mm_add_epi16 + +int32x4_t vaddq_s32(int32x4_t a, int32x4_t b); // VADD.I32 q0,q0,q0 +#define vaddq_s32 _mm_add_epi32 + +int64x2_t vaddq_s64(int64x2_t a, int64x2_t b); // VADD.I64 q0,q0,q0 +#define vaddq_s64 _mm_add_epi64 + +float32x4_t vaddq_f32(float32x4_t a, float32x4_t b); // VADD.F32 q0,q0,q0 +#define vaddq_f32 _mm_add_ps + +uint8x16_t vaddq_u8(uint8x16_t a, uint8x16_t b); // VADD.I8 q0,q0,q0 +#define vaddq_u8 _mm_add_epi8 + +uint16x8_t vaddq_u16(uint16x8_t a, uint16x8_t b); // VADD.I16 q0,q0,q0 +#define vaddq_u16 _mm_add_epi16 + +uint32x4_t vaddq_u32(uint32x4_t a, uint32x4_t b); // VADD.I32 q0,q0,q0 +#define vaddq_u32 _mm_add_epi32 + +uint64x2_t vaddq_u64(uint64x2_t a, uint64x2_t b); // VADD.I64 q0,q0,q0 +#define vaddq_u64 _mm_add_epi64 + +//**************************** Vector long add *****************************: +//*********************************************************************** +//Va, Vb have equal lane sizes, result is a 128 bit vector of lanes that are twice the width. +int16x8_t vaddl_s8(int8x8_t a, int8x8_t b); // VADDL.S8 q0,d0,d0 +_NEON2SSE_INLINE int16x8_t vaddl_s8(int8x8_t a, int8x8_t b) // VADDL.S8 q0,d0,d0 +{ + __m128i a16, b16; + a16 = _MM_CVTEPI8_EPI16 (_pM128i(a)); //SSE4.1, + b16 = _MM_CVTEPI8_EPI16 (_pM128i(b)); //SSE4.1, + return _mm_add_epi16 (a16, b16); +} + +int32x4_t vaddl_s16(int16x4_t a, int16x4_t b); // VADDL.S16 q0,d0,d0 +_NEON2SSE_INLINE int32x4_t vaddl_s16(int16x4_t a, int16x4_t b) // VADDL.S16 q0,d0,d0 +{ + __m128i a32, b32; + a32 = _MM_CVTEPI16_EPI32 (_pM128i(a)); //SSE4.1 + b32 = _MM_CVTEPI16_EPI32 (_pM128i(b)); //SSE4.1 + return _mm_add_epi32 (a32, b32); +} + +int64x2_t vaddl_s32(int32x2_t a, int32x2_t b); // VADDL.S32 q0,d0,d0 +_NEON2SSE_INLINE int64x2_t vaddl_s32(int32x2_t a, int32x2_t b) // VADDL.S32 q0,d0,d0 +{ + //may be not optimal + __m128i a64, b64; + a64 = _MM_CVTEPI32_EPI64 (_pM128i(a)); //SSE4.1 + b64 = _MM_CVTEPI32_EPI64 (_pM128i(b)); //SSE4.1 + return _mm_add_epi64 ( a64, b64); +} + +uint16x8_t vaddl_u8(uint8x8_t a, uint8x8_t b); // VADDL.U8 q0,d0,d0 +_NEON2SSE_INLINE uint16x8_t vaddl_u8(uint8x8_t a, uint8x8_t b) // VADDL.U8 q0,d0,d0 +{ + __m128i a16, b16; + a16 = _MM_CVTEPU8_EPI16 (_pM128i(a)); //SSE4.1 + b16 = _MM_CVTEPU8_EPI16 (_pM128i(b)); //SSE4.1 + return _mm_add_epi16 (a16, b16); +} + +uint32x4_t vaddl_u16(uint16x4_t a, uint16x4_t b); // VADDL.s16 q0,d0,d0 +_NEON2SSE_INLINE uint32x4_t vaddl_u16(uint16x4_t a, uint16x4_t b) // VADDL.s16 q0,d0,d0 +{ + __m128i a32, b32; + a32 = _MM_CVTEPU16_EPI32 (_pM128i(a)); //SSE4.1 + b32 = _MM_CVTEPU16_EPI32 (_pM128i(b)); //SSE4.1 + return _mm_add_epi32 (a32, b32); +} + +uint64x2_t vaddl_u32(uint32x2_t a, uint32x2_t b); // VADDL.U32 q0,d0,d0 +_NEON2SSE_INLINE uint64x2_t vaddl_u32(uint32x2_t a, uint32x2_t b) // VADDL.U32 q0,d0,d0 +{ + //may be not optimal + __m128i a64, b64; + a64 = _MM_CVTEPU32_EPI64 (_pM128i(a)); //SSE4.1 + b64 = _MM_CVTEPU32_EPI64 (_pM128i(b)); //SSE4.1 + return _mm_add_epi64 (a64, b64); +} + +//*************** Vector wide add: vaddw_. Vr[i]:=Va[i]+Vb[i] ****************** +//*************** ********************************************************************* +int16x8_t vaddw_s8(int16x8_t a, int8x8_t b); // VADDW.S8 q0,q0,d0 +_NEON2SSE_INLINE int16x8_t vaddw_s8(int16x8_t a, int8x8_t b) // VADDW.S8 q0,q0,d0 +{ + __m128i b16; + b16 = _MM_CVTEPI8_EPI16 (_pM128i(b)); //SSE4.1, + return _mm_add_epi16 (a, b16); +} + +int32x4_t vaddw_s16(int32x4_t a, int16x4_t b); // VADDW.S16 q0,q0,d0 +_NEON2SSE_INLINE int32x4_t vaddw_s16(int32x4_t a, int16x4_t b) // VADDW.S16 q0,q0,d0 +{ + __m128i b32; + b32 = _MM_CVTEPI16_EPI32(_pM128i(b)); //SSE4.1, + return _mm_add_epi32 (a, b32); +} + +int64x2_t vaddw_s32(int64x2_t a, int32x2_t b); // VADDW.S32 q0,q0,d0 +_NEON2SSE_INLINE int64x2_t vaddw_s32(int64x2_t a, int32x2_t b) // VADDW.S32 q0,q0,d0 +{ + __m128i b64; + b64 = _MM_CVTEPI32_EPI64 (_pM128i(b)); //SSE4.1 + return _mm_add_epi64 (a, b64); +} + +uint16x8_t vaddw_u8(uint16x8_t a, uint8x8_t b); // VADDW.U8 q0,q0,d0 +_NEON2SSE_INLINE uint16x8_t vaddw_u8(uint16x8_t a, uint8x8_t b) // VADDW.U8 q0,q0,d0 +{ + __m128i b16; + b16 = _MM_CVTEPU8_EPI16 (_pM128i(b)); //SSE4.1 + return _mm_add_epi16 (a, b16); +} + +uint32x4_t vaddw_u16(uint32x4_t a, uint16x4_t b); // VADDW.s16 q0,q0,d0 +_NEON2SSE_INLINE uint32x4_t vaddw_u16(uint32x4_t a, uint16x4_t b) // VADDW.s16 q0,q0,d0 +{ + __m128i b32; + b32 = _MM_CVTEPU16_EPI32 (_pM128i(b)); //SSE4.1 + return _mm_add_epi32 (a, b32); +} + +uint64x2_t vaddw_u32(uint64x2_t a, uint32x2_t b); // VADDW.U32 q0,q0,d0 +_NEON2SSE_INLINE uint64x2_t vaddw_u32(uint64x2_t a, uint32x2_t b) // VADDW.U32 q0,q0,d0 +{ + __m128i b64; + b64 = _MM_CVTEPU32_EPI64 (_pM128i(b)); //SSE4.1 + return _mm_add_epi64 (a, b64); +} + +//******************************Vector halving add: vhadd -> Vr[i]:=(Va[i]+Vb[i])>>1 , result truncated ******************************* +//************************************************************************************************************************* +int8x8_t vhadd_s8(int8x8_t a, int8x8_t b); // VHADD.S8 d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vhadd_s8(int8x8_t a, int8x8_t b) +{ + int8x8_t res64; + return64(vhaddq_u8(_pM128i(a), _pM128i(b))); +} + + +int16x4_t vhadd_s16(int16x4_t a, int16x4_t b); // VHADD.S16 d0,d0,d0 +_NEON2SSE_INLINE int16x4_t vhadd_s16(int16x4_t a, int16x4_t b) +{ + int16x4_t res64; + return64( vhaddq_s16(_pM128i(a), _pM128i(b))); +} + + +int32x2_t vhadd_s32(int32x2_t a, int32x2_t b); // VHADD.S32 d0,d0,d0 +_NEON2SSE_INLINE int32x2_t vhadd_s32(int32x2_t a, int32x2_t b) +{ + int32x2_t res64; + return64( vhaddq_s32(_pM128i(a), _pM128i(b))); +} + + +uint8x8_t vhadd_u8(uint8x8_t a, uint8x8_t b); // VHADD.w d0,d0,d0 +_NEON2SSE_INLINE uint8x8_t vhadd_u8(uint8x8_t a, uint8x8_t b) +{ + uint8x8_t res64; + return64( vhaddq_u8(_pM128i(a), _pM128i(b))); +} + + +uint16x4_t vhadd_u16(uint16x4_t a, uint16x4_t b); // VHADD.s16 d0,d0,d0 +_NEON2SSE_INLINE uint16x4_t vhadd_u16(uint16x4_t a, uint16x4_t b) +{ + uint16x4_t res64; + return64( vhaddq_u16(_pM128i(a), _pM128i(b))); +} + + +uint32x2_t vhadd_u32(uint32x2_t a, uint32x2_t b); // VHADD.U32 d0,d0,d0 +_NEON2SSE_INLINE uint32x2_t vhadd_u32(uint32x2_t a, uint32x2_t b) +{ + uint32x2_t res64; + return64( vhaddq_u32(_pM128i(a), _pM128i(b))); +} + + +int8x16_t vhaddq_s8(int8x16_t a, int8x16_t b); // VHADD.S8 q0,q0,q0 +_NEON2SSE_INLINE int8x16_t vhaddq_s8(int8x16_t a, int8x16_t b) +{ + //need to avoid internal overflow, will use the (x&y)+((x^y)>>1). + __m128i tmp1, tmp2; + tmp1 = _mm_and_si128(a,b); + tmp2 = _mm_xor_si128(a,b); + tmp2 = vshrq_n_s8(tmp2,1); + return _mm_add_epi8(tmp1,tmp2); +} + +int16x8_t vhaddq_s16(int16x8_t a, int16x8_t b); // VHADD.S1 6 q0,q0,q0 +_NEON2SSE_INLINE int16x8_t vhaddq_s16(int16x8_t a, int16x8_t b) +{ + //need to avoid internal overflow, will use the (x&y)+((x^y)>>1). + __m128i tmp1, tmp2; + tmp1 = _mm_and_si128(a,b); + tmp2 = _mm_xor_si128(a,b); + tmp2 = _mm_srai_epi16(tmp2,1); + return _mm_add_epi16(tmp1,tmp2); +} + +int32x4_t vhaddq_s32(int32x4_t a, int32x4_t b); // VHADD.S32 q0,q0,q0 +_NEON2SSE_INLINE int32x4_t vhaddq_s32(int32x4_t a, int32x4_t b) // VHADD.S32 q0,q0,q0 +{ + //need to avoid internal overflow, will use the (x&y)+((x^y)>>1). + __m128i tmp1, tmp2; + tmp1 = _mm_and_si128(a,b); + tmp2 = _mm_xor_si128(a,b); + tmp2 = _mm_srai_epi32(tmp2,1); + return _mm_add_epi32(tmp1,tmp2); +} + +uint8x16_t vhaddq_u8(uint8x16_t a, uint8x16_t b); // VHADD.U8 q0,q0,q0 +_NEON2SSE_INLINE uint8x16_t vhaddq_u8(uint8x16_t a, uint8x16_t b) // VHADD.U8 q0,q0,q0 +{ + __m128i c1, sum, res; + c1 = _mm_set1_epi8(1); + sum = _mm_avg_epu8(a, b); //result is rounded, need to compensate it + res = _mm_xor_si128(a, b); //for rounding compensation + res = _mm_and_si128(res,c1); //for rounding compensation + return _mm_sub_epi8 (sum, res); //actual rounding compensation +} + +uint16x8_t vhaddq_u16(uint16x8_t a, uint16x8_t b); // VHADD.s16 q0,q0,q0 +_NEON2SSE_INLINE uint16x8_t vhaddq_u16(uint16x8_t a, uint16x8_t b) // VHADD.s16 q0,q0,q0 +{ + __m128i sum, res; + sum = _mm_avg_epu16(a, b); //result is rounded, need to compensate it + res = _mm_xor_si128(a, b); //for rounding compensation + res = _mm_slli_epi16 (res,15); //shift left then back right to + res = _mm_srli_epi16 (res,15); //get 1 or zero + return _mm_sub_epi16 (sum, res); //actual rounding compensation +} + +uint32x4_t vhaddq_u32(uint32x4_t a, uint32x4_t b); // VHADD.U32 q0,q0,q0 +_NEON2SSE_INLINE uint32x4_t vhaddq_u32(uint32x4_t a, uint32x4_t b) // VHADD.U32 q0,q0,q0 +{ + //need to avoid internal overflow, will use the (x&y)+((x^y)>>1). + __m128i tmp1, tmp2; + tmp1 = _mm_and_si128(a,b); + tmp2 = _mm_xor_si128(a,b); + tmp2 = _mm_srli_epi32(tmp2,1); + return _mm_add_epi32(tmp1,tmp2); +} + +//************************Vector rounding halving add: vrhadd{q}_. Vr[i]:=(Va[i]+Vb[i]+1)>>1 *************************** +//***************************************************************************************************************************** +int8x8_t vrhadd_s8(int8x8_t a, int8x8_t b); // VRHADD.S8 d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vrhadd_s8(int8x8_t a, int8x8_t b) +{ + int8x8_t res64; + return64(vrhaddq_s8(_pM128i(a), _pM128i(b))); +} + + +int16x4_t vrhadd_s16(int16x4_t a, int16x4_t b); // VRHADD.S16 d0,d0,d0 +_NEON2SSE_INLINE int16x4_t vrhadd_s16(int16x4_t a, int16x4_t b) +{ + int16x4_t res64; + return64(vrhaddq_s16(_pM128i(a), _pM128i(b))); +} + + +int32x2_t vrhadd_s32(int32x2_t a, int32x2_t b); // VRHADD.S32 d0,d0,d0 +_NEON2SSE_INLINE int32x2_t vrhadd_s32(int32x2_t a, int32x2_t b) +{ + int32x2_t res64; + return64(vrhaddq_s32(_pM128i(a), _pM128i(b))); +} + + +uint8x8_t vrhadd_u8(uint8x8_t a, uint8x8_t b); // VRHADD.U8 d0,d0,d0 +_NEON2SSE_INLINE uint8x8_t vrhadd_u8(uint8x8_t a, uint8x8_t b) +{ + uint8x8_t res64; + return64(_mm_avg_epu8(_pM128i(a),_pM128i(b))); //SSE, result rounding!!! +} + + +uint16x4_t vrhadd_u16(uint16x4_t a, uint16x4_t b); // VRHADD.s16 d0,d0,d0 +_NEON2SSE_INLINE uint16x4_t vrhadd_u16(uint16x4_t a, uint16x4_t b) +{ + uint16x4_t res64; + return64(_mm_avg_epu16(_pM128i(a),_pM128i(b))); //SSE, result rounding!!! +} + + +uint32x2_t vrhadd_u32(uint32x2_t a, uint32x2_t b); // VRHADD.U32 d0,d0,d0 +_NEON2SSE_INLINE uint32x2_t vrhadd_u32(uint32x2_t a, uint32x2_t b) +{ + uint32x2_t res64; + return64(vrhaddq_u32(_pM128i(a), _pM128i(b))); +} + + +int8x16_t vrhaddq_s8(int8x16_t a, int8x16_t b); // VRHADD.S8 q0,q0,q0 +_NEON2SSE_INLINE int8x16_t vrhaddq_s8(int8x16_t a, int8x16_t b) // VRHADD.S8 q0,q0,q0 +{ + //no signed average in x86 SIMD, go to unsigned + __m128i c128, au, bu, sum; + c128 = _mm_set1_epi8(0x80); //-128 + au = _mm_sub_epi8(a, c128); //add 128 + bu = _mm_sub_epi8(b, c128); //add 128 + sum = _mm_avg_epu8(au, bu); + return _mm_add_epi8 (sum, c128); //sub 128 +} + +int16x8_t vrhaddq_s16(int16x8_t a, int16x8_t b); // VRHADD.S16 q0,q0,q0 +_NEON2SSE_INLINE int16x8_t vrhaddq_s16(int16x8_t a, int16x8_t b) // VRHADD.S16 q0,q0,q0 +{ + //no signed average in x86 SIMD, go to unsigned + __m128i cx8000, au, bu, sum; + cx8000 = _mm_set1_epi16(0x8000); // - 32768 + au = _mm_sub_epi16(a, cx8000); //add 32768 + bu = _mm_sub_epi16(b, cx8000); //add 32768 + sum = _mm_avg_epu16(au, bu); + return _mm_add_epi16 (sum, cx8000); //sub 32768 +} + +int32x4_t vrhaddq_s32(int32x4_t a, int32x4_t b); // VRHADD.S32 q0,q0,q0 +_NEON2SSE_INLINE int32x4_t vrhaddq_s32(int32x4_t a, int32x4_t b) +{ + //need to avoid overflow + __m128i a2, b2, res, sum; + a2 = _mm_srai_epi32(a,1); //a2=a/2; + b2 = _mm_srai_epi32(b,1); // b2=b/2; + res = _mm_or_si128(a,b); //for rounding + res = _mm_slli_epi32 (res,31); //shift left then back right to + res = _mm_srli_epi32 (res,31); //get 1 or zero + sum = _mm_add_epi32(a2,b2); + return _mm_add_epi32(sum,res); +} + +uint8x16_t vrhaddq_u8(uint8x16_t a, uint8x16_t b); // VRHADD.U8 q0,q0,q0 +#define vrhaddq_u8 _mm_avg_epu8 //SSE2, results rounded + +uint16x8_t vrhaddq_u16(uint16x8_t a, uint16x8_t b); // VRHADD.s16 q0,q0,q0 +#define vrhaddq_u16 _mm_avg_epu16 //SSE2, results rounded + + +uint32x4_t vrhaddq_u32(uint32x4_t a, uint32x4_t b); // VRHADD.U32 q0,q0,q0 +_NEON2SSE_INLINE uint32x4_t vrhaddq_u32(uint32x4_t a, uint32x4_t b) // VRHADD.U32 q0,q0,q0 +{ + //need to avoid overflow + __m128i a2, b2, res, sum; + a2 = _mm_srli_epi32(a,1); //a2=a/2; + b2 = _mm_srli_epi32(b,1); // b2=b/2; + res = _mm_or_si128(a,b); //for rounding + res = _mm_slli_epi32 (res,31); //shift left then back right to + res = _mm_srli_epi32 (res,31); //get 1 or zero + sum = _mm_add_epi32(a2,b2); + return _mm_add_epi32(sum,res); +} + +//****************** VQADD: Vector saturating add ************************ +//************************************************************************ +int8x8_t vqadd_s8(int8x8_t a, int8x8_t b); // VQADD.S8 d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vqadd_s8(int8x8_t a, int8x8_t b) +{ + int8x8_t res64; + return64(_mm_adds_epi8(_pM128i(a),_pM128i(b))); +} + + +int16x4_t vqadd_s16(int16x4_t a, int16x4_t b); // VQADD.S16 d0,d0,d0 +_NEON2SSE_INLINE int16x4_t vqadd_s16(int16x4_t a, int16x4_t b) +{ + int16x4_t res64; + return64(_mm_adds_epi16(_pM128i(a),_pM128i(b))); +} + + +int32x2_t vqadd_s32(int32x2_t a, int32x2_t b); // VQADD.S32 d0,d0,d0 +_NEON2SSE_INLINE int32x2_t vqadd_s32(int32x2_t a, int32x2_t b) +{ + int32x2_t res64; + return64(vqaddq_s32(_pM128i(a), _pM128i(b))); +} + + +int64x1_t vqadd_s64(int64x1_t a, int64x1_t b); // VQADD.S64 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x1_t vqadd_s64(int64x1_t a, int64x1_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + int64x1_t res; + uint64_t a64, b64; + a64 = a.m64_u64[0]; + b64 = b.m64_u64[0]; + res.m64_u64[0] = a64 + b64; + a64 = (a64 >> 63) + (~_SIGNBIT64); + if ((int64_t)((b64 ^ a64) | ~(res.m64_u64[0] ^ b64))>=0) { + res.m64_u64[0] = a64; + } + return res; +} + +uint8x8_t vqadd_u8(uint8x8_t a, uint8x8_t b); // VQADD.U8 d0,d0,d0 +_NEON2SSE_INLINE uint8x8_t vqadd_u8(uint8x8_t a, uint8x8_t b) +{ + uint8x8_t res64; + return64(_mm_adds_epu8(_pM128i(a),_pM128i(b))); +} + + +uint16x4_t vqadd_u16(uint16x4_t a, uint16x4_t b); // VQADD.s16 d0,d0,d0 +_NEON2SSE_INLINE uint16x4_t vqadd_u16(uint16x4_t a, uint16x4_t b) +{ + uint16x4_t res64; + return64(_mm_adds_epu16(_pM128i(a),_pM128i(b))); +} + + +uint32x2_t vqadd_u32(uint32x2_t a, uint32x2_t b); // VQADD.U32 d0,d0,d0 +_NEON2SSE_INLINE uint32x2_t vqadd_u32(uint32x2_t a, uint32x2_t b) +{ + uint32x2_t res64; + return64(vqaddq_u32(_pM128i(a), _pM128i(b))); +} + + +uint64x1_t vqadd_u64(uint64x1_t a, uint64x1_t b); // VQADD.U64 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint64x1_t vqadd_u64(uint64x1_t a, uint64x1_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + _NEON2SSE_ALIGN_16 uint64_t a64, b64; + uint64x1_t res; + a64 = a.m64_u64[0]; + b64 = b.m64_u64[0]; + res.m64_u64[0] = a64 + b64; + if (res.m64_u64[0] < a64) { + res.m64_u64[0] = ~(uint64_t)0; + } + return res; +} + +int8x16_t vqaddq_s8(int8x16_t a, int8x16_t b); // VQADD.S8 q0,q0,q0 +#define vqaddq_s8 _mm_adds_epi8 + +int16x8_t vqaddq_s16(int16x8_t a, int16x8_t b); // VQADD.S16 q0,q0,q0 +#define vqaddq_s16 _mm_adds_epi16 + +int32x4_t vqaddq_s32(int32x4_t a, int32x4_t b); // VQADD.S32 q0,q0,q0 +_NEON2SSE_INLINE int32x4_t vqaddq_s32(int32x4_t a, int32x4_t b) +{ + //no corresponding x86 SIMD soulution, special tricks are necessary. Overflow happens only if a and b have the same sign and sum has the opposite sign + __m128i c7fffffff, res, res_sat, res_xor_a, b_xor_a_; + c7fffffff = _mm_set1_epi32(0x7fffffff); + res = _mm_add_epi32(a, b); + res_sat = _mm_srli_epi32(a, 31); + res_sat = _mm_add_epi32(res_sat, c7fffffff); + res_xor_a = _mm_xor_si128(res, a); + b_xor_a_ = _mm_xor_si128(b, a); + res_xor_a = _mm_andnot_si128(b_xor_a_, res_xor_a); + res_xor_a = _mm_srai_epi32(res_xor_a,31); //propagate the sigh bit, all ffff if <0 all ones otherwise + res_sat = _mm_and_si128(res_xor_a, res_sat); + res = _mm_andnot_si128(res_xor_a, res); + return _mm_or_si128(res, res_sat); +} + +int64x2_t vqaddq_s64(int64x2_t a, int64x2_t b); // VQADD.S64 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x2_t vqaddq_s64(int64x2_t a, int64x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + _NEON2SSE_ALIGN_16 uint64_t atmp[2], btmp[2], res[2]; + _mm_store_si128((__m128i*)atmp, a); + _mm_store_si128((__m128i*)btmp, b); + res[0] = atmp[0] + btmp[0]; + res[1] = atmp[1] + btmp[1]; + + atmp[0] = (atmp[0] >> 63) + (~_SIGNBIT64); + atmp[1] = (atmp[1] >> 63) + (~_SIGNBIT64); + + if ((int64_t)((btmp[0] ^ atmp[0]) | ~(res[0] ^ btmp[0]))>=0) { + res[0] = atmp[0]; + } + if ((int64_t)((btmp[1] ^ atmp[1]) | ~(res[1] ^ btmp[1]))>=0) { + res[1] = atmp[1]; + } + return _mm_load_si128((__m128i*)res); +} + +uint8x16_t vqaddq_u8(uint8x16_t a, uint8x16_t b); // VQADD.U8 q0,q0,q0 +#define vqaddq_u8 _mm_adds_epu8 + +uint16x8_t vqaddq_u16(uint16x8_t a, uint16x8_t b); // VQADD.s16 q0,q0,q0 +#define vqaddq_u16 _mm_adds_epu16 + +uint32x4_t vqaddq_u32(uint32x4_t a, uint32x4_t b); // VQADD.U32 q0,q0,q0 +_NEON2SSE_INLINE uint32x4_t vqaddq_u32(uint32x4_t a, uint32x4_t b) +{ + __m128i c80000000, cmp, subsum, suba, sum; + c80000000 = _mm_set1_epi32 (0x80000000); + sum = _mm_add_epi32 (a, b); + subsum = _mm_sub_epi32 (sum, c80000000); + suba = _mm_sub_epi32 (a, c80000000); + cmp = _mm_cmpgt_epi32 ( suba, subsum); //no unsigned comparison, need to go to signed + return _mm_or_si128 (sum, cmp); //saturation +} + +uint64x2_t vqaddq_u64(uint64x2_t a, uint64x2_t b); // VQADD.U64 q0,q0,q0 +#ifdef USE_SSE4 + _NEON2SSE_INLINE uint64x2_t vqaddq_u64(uint64x2_t a, uint64x2_t b) + { + __m128i c80000000, sum, cmp, suba, subsum; + c80000000 = _mm_set_epi32 (0x80000000, 0x0, 0x80000000, 0x0); + sum = _mm_add_epi64 (a, b); + subsum = _mm_sub_epi64 (sum, c80000000); + suba = _mm_sub_epi64 (a, c80000000); + cmp = _mm_cmpgt_epi64 ( suba, subsum); //no unsigned comparison, need to go to signed, SSE4.2!!! + return _mm_or_si128 (sum, cmp); //saturation + } +#else + _NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint64x2_t vqaddq_u64(uint64x2_t a, uint64x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) + { + _NEON2SSE_ALIGN_16 uint64_t atmp[2], btmp[2], res[2]; + _mm_store_si128((__m128i*)atmp, a); + _mm_store_si128((__m128i*)btmp, b); + res[0] = atmp[0] + btmp[0]; + res[1] = atmp[1] + btmp[1]; + if (res[0] < atmp[0]) res[0] = ~(uint64_t)0; + if (res[1] < atmp[1]) res[1] = ~(uint64_t)0; + return _mm_load_si128((__m128i*)(res)); + } +#endif + + +//******************* Vector add high half (truncated) ****************** +//************************************************************************ +int8x8_t vaddhn_s16(int16x8_t a, int16x8_t b); // VADDHN.I16 d0,q0,q0 +_NEON2SSE_INLINE int8x8_t vaddhn_s16(int16x8_t a, int16x8_t b) // VADDHN.I16 d0,q0,q0 +{ + int8x8_t res64; + __m128i sum; + sum = _mm_add_epi16 (a, b); + sum = _mm_srai_epi16 (sum, 8); + sum = _mm_packs_epi16 (sum, sum); //use 64 low bits only + return64(sum); +} + +int16x4_t vaddhn_s32(int32x4_t a, int32x4_t b); // VADDHN.I32 d0,q0,q0 +_NEON2SSE_INLINE int16x4_t vaddhn_s32(int32x4_t a, int32x4_t b) // VADDHN.I32 d0,q0,q0 +{ + int16x4_t res64; + __m128i sum; + sum = _mm_add_epi32 (a, b); + sum = _mm_srai_epi32(sum, 16); + sum = _mm_packs_epi32 (sum, sum); //use 64 low bits only + return64(sum); +} + +int32x2_t vaddhn_s64(int64x2_t a, int64x2_t b); // VADDHN.I64 d0,q0,q0 +_NEON2SSE_INLINE int32x2_t vaddhn_s64(int64x2_t a, int64x2_t b) +{ + int32x2_t res64; + __m128i sum; + sum = _mm_add_epi64 (a, b); + sum = _mm_shuffle_epi32(sum, 1 | (3 << 2) | (0 << 4) | (2 << 6)); + return64(sum); +} + +uint8x8_t vaddhn_u16(uint16x8_t a, uint16x8_t b); // VADDHN.I16 d0,q0,q0 +_NEON2SSE_INLINE uint8x8_t vaddhn_u16(uint16x8_t a, uint16x8_t b) // VADDHN.I16 d0,q0,q0 +{ + uint8x8_t res64; + __m128i sum; + sum = _mm_add_epi16 (a, b); + sum = _mm_srli_epi16 (sum, 8); + sum = _mm_packus_epi16 (sum,sum); //use 64 low bits only + return64(sum); +} + +uint16x4_t vaddhn_u32(uint32x4_t a, uint32x4_t b); // VADDHN.I32 d0,q0,q0 +_NEON2SSE_INLINE uint16x4_t vaddhn_u32(uint32x4_t a, uint32x4_t b) // VADDHN.I32 d0,q0,q0 +{ + uint16x4_t res64; + __m128i sum; + sum = _mm_add_epi32 (a, b); + sum = _mm_srli_epi32 (sum, 16); + sum = _MM_PACKUS1_EPI32 (sum); //use 64 low bits only + return64(sum); +} + +uint32x2_t vaddhn_u64(uint64x2_t a, uint64x2_t b); // VADDHN.I64 d0,q0,q0 +#define vaddhn_u64 vaddhn_s64 + +//*********** Vector rounding add high half: vraddhn_ ******************. +//*************************************************************************** +int8x8_t vraddhn_s16(int16x8_t a, int16x8_t b); // VRADDHN.I16 d0,q0,q0 +_NEON2SSE_INLINE int8x8_t vraddhn_s16(int16x8_t a, int16x8_t b) // VRADDHN.I16 d0,q0,q0 +{ + int8x8_t res64; + __m128i sum, mask1; + sum = _mm_add_epi16 (a, b); + mask1 = _mm_slli_epi16(sum, 9); //shift left then back right to + mask1 = _mm_srli_epi16(mask1, 15); //get 7-th bit 1 or zero + sum = _mm_srai_epi16 (sum, 8); //get high half + sum = _mm_add_epi16 (sum, mask1); //actual rounding + sum = _mm_packs_epi16 (sum, sum); + return64(sum); +} + +int16x4_t vraddhn_s32(int32x4_t a, int32x4_t b); // VRADDHN.I32 d0,q0,q0 +_NEON2SSE_INLINE int16x4_t vraddhn_s32(int32x4_t a, int32x4_t b) // VRADDHN.I32 d0,q0,q0 +{ + //SIMD may be not optimal, serial may be faster + int16x4_t res64; + __m128i sum, mask1; + sum = _mm_add_epi32 (a, b); + mask1 = _mm_slli_epi32(sum, 17); //shift left then back right to + mask1 = _mm_srli_epi32(mask1,31); //get 15-th bit 1 or zero + sum = _mm_srai_epi32 (sum, 16); //get high half + sum = _mm_add_epi32 (sum, mask1); //actual rounding + sum = _mm_packs_epi32 (sum, sum); + return64(sum); +} + +int32x2_t vraddhn_s64(int64x2_t a, int64x2_t b); // VRADDHN.I64 d0,q0,q0 +_NEON2SSE_INLINE int32x2_t vraddhn_s64(int64x2_t a, int64x2_t b) +{ + //SIMD may be not optimal, serial may be faster + int32x2_t res64; + __m128i sum, mask1; + sum = _mm_add_epi64 (a, b); + mask1 = _mm_slli_epi64(sum, 33); //shift left then back right to + mask1 = _mm_srli_epi64(mask1,32); //get 31-th bit 1 or zero + sum = _mm_add_epi64 (sum, mask1); //actual high half rounding + sum = _mm_shuffle_epi32(sum, 1 | (3 << 2) | (1 << 4) | (3 << 6)); + return64(sum); +} + +uint8x8_t vraddhn_u16(uint16x8_t a, uint16x8_t b); // VRADDHN.I16 d0,q0,q0 +_NEON2SSE_INLINE uint8x8_t vraddhn_u16(uint16x8_t a, uint16x8_t b) // VRADDHN.I16 d0,q0,q0 +{ + uint8x8_t res64; + __m128i sum, mask1; + sum = _mm_add_epi16 (a, b); + mask1 = _mm_slli_epi16(sum, 9); //shift left then back right to + mask1 = _mm_srli_epi16(mask1, 15); //get 7-th bit 1 or zero + sum = _mm_srai_epi16 (sum, 8); //get high half + sum = _mm_add_epi16 (sum, mask1); //actual rounding + sum = _mm_packus_epi16 (sum, sum); + return64(sum); +} + +uint16x4_t vraddhn_u32(uint32x4_t a, uint32x4_t b); // VRADDHN.I32 d0,q0,q0 +_NEON2SSE_INLINE uint16x4_t vraddhn_u32(uint32x4_t a, uint32x4_t b) +{ + //SIMD may be not optimal, serial may be faster + uint16x4_t res64; + __m128i sum, mask1; + sum = _mm_add_epi32 (a, b); + mask1 = _mm_slli_epi32(sum, 17); //shift left then back right to + mask1 = _mm_srli_epi32(mask1,31); //get 15-th bit 1 or zero + sum = _mm_srai_epi32 (sum, 16); //get high half + sum = _mm_add_epi32 (sum, mask1); //actual rounding + sum = _MM_PACKUS1_EPI32 (sum); + return64(sum); +} + +uint32x2_t vraddhn_u64(uint64x2_t a, uint64x2_t b); // VRADDHN.I64 d0,q0,q0 +#define vraddhn_u64 vraddhn_s64 + +//********************************************************************************** +//********* Multiplication ************************************* +//************************************************************************************** + +//Vector multiply: vmul -> Vr[i] := Va[i] * Vb[i] +//As we don't go to wider result functions are equal to "multiply low" in x86 +int8x8_t vmul_s8(int8x8_t a, int8x8_t b); // VMUL.I8 d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vmul_s8(int8x8_t a, int8x8_t b) // VMUL.I8 d0,d0,d0 +{ + // no 8 bit simd multiply, need to go to 16 bits in SSE + int8x8_t res64; + __m128i a128, b128, res; + _NEON2SSE_ALIGN_16 int8_t mask8_16_even_odd[16] = { 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15 }; + a128 = _MM_CVTEPI8_EPI16 (_pM128i(a)); // SSE 4.1 use low 64 bits + b128 = _MM_CVTEPI8_EPI16 (_pM128i(b)); // SSE 4.1 use low 64 bits + res = _mm_mullo_epi16 (a128, b128); + res = _mm_shuffle_epi8 (res, *(__m128i*) mask8_16_even_odd); //return to 8 bit from 16, use 64 low bits only + return64(res); +} + +int16x4_t vmul_s16(int16x4_t a, int16x4_t b); // VMUL.I16 d0,d0,d0 +#define vmul_s16 vmul_u16 + +int32x2_t vmul_s32(int32x2_t a, int32x2_t b); // VMUL.I32 d0,d0,d0 +#define vmul_s32 vmul_u32 + +float32x2_t vmul_f32(float32x2_t a, float32x2_t b); // VMUL.F32 d0,d0,d0 +_NEON2SSE_INLINE float32x2_t vmul_f32(float32x2_t a, float32x2_t b) +{ + float32x4_t tmp; + __m64_128 res64; + tmp = _mm_mul_ps(_pM128(a),_pM128(b)); + _M64f(res64, tmp); //use low 64 bits + return res64; +} + +uint8x8_t vmul_u8(uint8x8_t a, uint8x8_t b); // VMUL.I8 d0,d0,d0 +_NEON2SSE_INLINE uint8x8_t vmul_u8(uint8x8_t a, uint8x8_t b) // VMUL.I8 d0,d0,d0 +{ + // no 8 bit simd multiply, need to go to 16 bits in SSE + uint8x8_t res64; + __m128i mask, a128, b128, res; + mask = _mm_set1_epi16(0xff); + a128 = _MM_CVTEPU8_EPI16 (_pM128i(a)); + b128 = _MM_CVTEPU8_EPI16 (_pM128i(b)); + res = _mm_mullo_epi16 (a128, b128); + res = _mm_and_si128(res, mask); //to avoid saturation + res = _mm_packus_epi16 (res,res); //use only low 64 bits + return64(res); +} + +uint16x4_t vmul_u16(uint16x4_t a, uint16x4_t b); // VMUL.I16 d0,d0,d0 +_NEON2SSE_INLINE uint16x4_t vmul_u16(uint16x4_t a, uint16x4_t b) +{ + uint16x4_t res64; + return64(_mm_mullo_epi16(_pM128i(a),_pM128i(b))); +} + + +uint32x2_t vmul_u32(uint32x2_t a, uint32x2_t b); // VMUL.I32 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING( uint32x2_t vmul_u32(uint32x2_t a, uint32x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + uint32x2_t res; + res.m64_u32[0] = a.m64_u32[0] * b.m64_u32[0]; + res.m64_u32[1] = a.m64_u32[1] * b.m64_u32[1]; + return res; +} + +poly8x8_t vmul_p8(poly8x8_t a, poly8x8_t b); // VMUL.P8 d0,d0,d0 +_NEON2SSE_INLINE poly8x8_t vmul_p8(poly8x8_t a, poly8x8_t b) +{ + //may be optimized + poly8x8_t res64; + __m128i a64, b64, c1, res, tmp, bmasked; + int i; + a64 = _pM128i(a); + b64 = _pM128i(b); + c1 = _mm_cmpeq_epi8 (a64,a64); //all ones 0xff.... + c1 = vshrq_n_u8(c1,7); //0x1 + bmasked = _mm_and_si128(b64, c1); //0x1 + res = vmulq_u8(a64, bmasked); + for(i = 1; i<8; i++) { + c1 = _mm_slli_epi16(c1,1); //shift mask left by 1, 16 bit shift is OK here + bmasked = _mm_and_si128(b64, c1); //0x1 + tmp = vmulq_u8(a64, bmasked); + res = _mm_xor_si128(res, tmp); + } + return64 (res); +} + +int8x16_t vmulq_s8(int8x16_t a, int8x16_t b); // VMUL.I8 q0,q0,q0 +_NEON2SSE_INLINE int8x16_t vmulq_s8(int8x16_t a, int8x16_t b) // VMUL.I8 q0,q0,q0 +{ + // no 8 bit simd multiply, need to go to 16 bits + //solution may be not optimal + __m128i a16, b16, r16_1, r16_2; + _NEON2SSE_ALIGN_16 int8_t mask8_16_even_odd[16] = { 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15 }; + a16 = _MM_CVTEPI8_EPI16 (a); // SSE 4.1 + b16 = _MM_CVTEPI8_EPI16 (b); // SSE 4.1 + r16_1 = _mm_mullo_epi16 (a16, b16); + //swap hi and low part of a and b to process the remaining data + a16 = _mm_shuffle_epi32 (a, _SWAP_HI_LOW32); + b16 = _mm_shuffle_epi32 (b, _SWAP_HI_LOW32); + a16 = _MM_CVTEPI8_EPI16 (a16); // SSE 4.1 + b16 = _MM_CVTEPI8_EPI16 (b16); // SSE 4.1 __m128i r16_2 + + r16_2 = _mm_mullo_epi16 (a16, b16); + r16_1 = _mm_shuffle_epi8 (r16_1, *(__m128i*)mask8_16_even_odd); //return to 8 bit + r16_2 = _mm_shuffle_epi8 (r16_2, *(__m128i*)mask8_16_even_odd); //return to 8 bit + + return _mm_unpacklo_epi64(r16_1, r16_2); +} + +int16x8_t vmulq_s16(int16x8_t a, int16x8_t b); // VMUL.I16 q0,q0,q0 +#define vmulq_s16 _mm_mullo_epi16 + +int32x4_t vmulq_s32(int32x4_t a, int32x4_t b); // VMUL.I32 q0,q0,q0 +#define vmulq_s32 _MM_MULLO_EPI32 //SSE4.1 + +float32x4_t vmulq_f32(float32x4_t a, float32x4_t b); // VMUL.F32 q0,q0,q0 +#define vmulq_f32 _mm_mul_ps + +uint8x16_t vmulq_u8(uint8x16_t a, uint8x16_t b); // VMUL.I8 q0,q0,q0 +_NEON2SSE_INLINE uint8x16_t vmulq_u8(uint8x16_t a, uint8x16_t b) // VMUL.I8 q0,q0,q0 +{ + // no 8 bit simd multiply, need to go to 16 bits + //solution may be not optimal + __m128i maskff, a16, b16, r16_1, r16_2; + maskff = _mm_set1_epi16(0xff); + a16 = _MM_CVTEPU8_EPI16 (a); // SSE 4.1 + b16 = _MM_CVTEPU8_EPI16 (b); // SSE 4.1 + r16_1 = _mm_mullo_epi16 (a16, b16); + r16_1 = _mm_and_si128(r16_1, maskff); //to avoid saturation + //swap hi and low part of a and b to process the remaining data + a16 = _mm_shuffle_epi32 (a, _SWAP_HI_LOW32); + b16 = _mm_shuffle_epi32 (b, _SWAP_HI_LOW32); + a16 = _MM_CVTEPI8_EPI16 (a16); // SSE 4.1 + b16 = _MM_CVTEPI8_EPI16 (b16); // SSE 4.1 + + r16_2 = _mm_mullo_epi16 (a16, b16); + r16_2 = _mm_and_si128(r16_2, maskff); //to avoid saturation + return _mm_packus_epi16 (r16_1, r16_2); +} + +uint16x8_t vmulq_u16(uint16x8_t a, uint16x8_t b); // VMUL.I16 q0,q0,q0 +#define vmulq_u16 _mm_mullo_epi16 + +uint32x4_t vmulq_u32(uint32x4_t a, uint32x4_t b); // VMUL.I32 q0,q0,q0 +#define vmulq_u32 _MM_MULLO_EPI32 //SSE4.1 + +poly8x16_t vmulq_p8(poly8x16_t a, poly8x16_t b); // VMUL.P8 q0,q0,q0 +_NEON2SSE_INLINE poly8x16_t vmulq_p8(poly8x16_t a, poly8x16_t b) +{ + //may be optimized + __m128i c1, res, tmp, bmasked; + int i; + c1 = _mm_cmpeq_epi8 (a,a); //all ones 0xff.... + c1 = vshrq_n_u8(c1,7); //0x1 + bmasked = _mm_and_si128(b, c1); //0x1 + res = vmulq_u8(a, bmasked); + for(i = 1; i<8; i++) { + c1 = _mm_slli_epi16(c1,1); //shift mask left by 1, 16 bit shift is OK here + bmasked = _mm_and_si128(b, c1); //0x1 + tmp = vmulq_u8(a, bmasked); + res = _mm_xor_si128(res, tmp); + } + return res; +} + +//************************* Vector long multiply *********************************** +//**************************************************************************** +int16x8_t vmull_s8(int8x8_t a, int8x8_t b); // VMULL.S8 q0,d0,d0 +_NEON2SSE_INLINE int16x8_t vmull_s8(int8x8_t a, int8x8_t b) // VMULL.S8 q0,d0,d0 +{ + //no 8 bit simd multiply, need to go to 16 bits + __m128i a16, b16; + a16 = _MM_CVTEPI8_EPI16 (_pM128i(a)); // SSE 4.1 + b16 = _MM_CVTEPI8_EPI16 (_pM128i(b)); // SSE 4.1 + return _mm_mullo_epi16 (a16, b16); //should fit into 16 bit +} + +int32x4_t vmull_s16(int16x4_t a, int16x4_t b); // VMULL.S16 q0,d0,d0 +_NEON2SSE_INLINE int32x4_t vmull_s16(int16x4_t a, int16x4_t b) // VMULL.S16 q0,d0,d0 +{ + #ifdef USE_SSE4 + __m128i a16, b16; + a16 = _MM_CVTEPI16_EPI32 (_pM128i(a)); // SSE 4.1 + b16 = _MM_CVTEPI16_EPI32 (_pM128i(b)); // SSE 4.1 + return _MM_MULLO_EPI32 (a16, b16); // SSE 4.1 + #else + __m128i low, hi, a128,b128; + a128 = _pM128i(a); + b128 = _pM128i(b); + low = _mm_mullo_epi16(a128,b128); + hi = _mm_mulhi_epi16(a128,b128); + return _mm_unpacklo_epi16(low,hi); + #endif +} + +int64x2_t vmull_s32(int32x2_t a, int32x2_t b); // VMULL.S32 q0,d0,d0 +_NEON2SSE_INLINE int64x2_t vmull_s32(int32x2_t a, int32x2_t b) // VMULL.S32 q0,d0,d0 +{ + __m128i ab, ba, a128, b128; + a128 = _pM128i(a); + b128 = _pM128i(b); + ab = _mm_unpacklo_epi32 (a128, b128); //a0, b0, a1,b1 + ba = _mm_unpacklo_epi32 (b128, a128); //b0, a0, b1,a1 + return _MM_MUL_EPI32(ab, ba); //uses 1rst and 3rd data lanes, the multiplication gives 64 bit result +} + +uint16x8_t vmull_u8(uint8x8_t a, uint8x8_t b); // VMULL.U8 q0,d0,d0 +_NEON2SSE_INLINE uint16x8_t vmull_u8(uint8x8_t a, uint8x8_t b) // VMULL.U8 q0,d0,d0 +{ + //no 8 bit simd multiply, need to go to 16 bits + __m128i a16, b16; + a16 = _MM_CVTEPU8_EPI16 (_pM128i(a)); // SSE 4.1 + b16 = _MM_CVTEPU8_EPI16 (_pM128i(b)); // SSE 4.1 + return _mm_mullo_epi16 (a16, b16); //should fit into 16 bit +} + +uint32x4_t vmull_u16(uint16x4_t a, uint16x4_t b); // VMULL.s16 q0,d0,d0 +_NEON2SSE_INLINE uint32x4_t vmull_u16(uint16x4_t a, uint16x4_t b) // VMULL.s16 q0,d0,d0 +{ + #ifdef USE_SSE4 + __m128i a16, b16; + a16 = _MM_CVTEPU16_EPI32 (_pM128i(a)); // SSE 4.1 + b16 = _MM_CVTEPU16_EPI32 (_pM128i(b)); // SSE 4.1 + return _MM_MULLO_EPI32 (a16, b16); // SSE 4.1 + #else + __m128i a128,b128,low, hi; + a128 = _pM128i(a); + b128 = _pM128i(b); + low = _mm_mullo_epi16(a128,b128); + hi = _mm_mulhi_epu16(a128,b128); + return _mm_unpacklo_epi16(low,hi); + #endif +} + +uint64x2_t vmull_u32(uint32x2_t a, uint32x2_t b); // VMULL.U32 q0,d0,d0 +_NEON2SSE_INLINE uint64x2_t vmull_u32(uint32x2_t a, uint32x2_t b) // VMULL.U32 q0,d0,d0 +{ + ///may be not optimal compared with serial implementation + __m128i ab, ba, a128, b128; + a128 = _pM128i(a); + b128 = _pM128i(b); + ab = _mm_unpacklo_epi32 (a128, b128); //a0, b0, a1,b1 + ba = _mm_unpacklo_epi32 (b128, a128); //b0, a0, b1,a1 + return _mm_mul_epu32 (ab, ba); //uses 1rst and 3rd data lanes, the multiplication gives 64 bit result +} + +poly16x8_t vmull_p8(poly8x8_t a, poly8x8_t b); // VMULL.P8 q0,d0,d0 +_NEON2SSE_INLINE poly16x8_t vmull_p8(poly8x8_t a, poly8x8_t b) +{ + //may be optimized + __m128i a128,b128, c1, a128_16, bmasked_16, res, tmp, bmasked; + int i; + a128 = _pM128i(a); + b128 = _pM128i(b); + c1 = _mm_cmpeq_epi8 (a128,a128); //all ones 0xff.... + c1 = vshrq_n_u8(c1,7); //0x1 + bmasked = _mm_and_si128(b128, c1); //0x1 + + a128_16 = _MM_CVTEPU8_EPI16 (a128); // SSE 4.1 + bmasked_16 = _MM_CVTEPU8_EPI16 (bmasked); // SSE 4.1 + res = _mm_mullo_epi16 (a128_16, bmasked_16); //should fit into 16 bit + for(i = 1; i<8; i++) { + c1 = _mm_slli_epi16(c1,1); //shift mask left by 1, 16 bit shift is OK here + bmasked = _mm_and_si128(b128, c1); //0x1 + bmasked_16 = _MM_CVTEPU8_EPI16 (bmasked); // SSE 4.1 + tmp = _mm_mullo_epi16 (a128_16, bmasked_16); //should fit into 16 bit, vmull_u8(a, bmasked); + res = _mm_xor_si128(res, tmp); + } + return res; +} + +//****************Vector saturating doubling long multiply ************************** +//***************************************************************** +int32x4_t vqdmull_s16(int16x4_t a, int16x4_t b); // VQDMULL.S16 q0,d0,d0 +_NEON2SSE_INLINE int32x4_t vqdmull_s16(int16x4_t a, int16x4_t b) +{ + //the serial soulution may be faster due to saturation + __m128i res; + res = vmull_s16(a, b); + return vqd_s32(res); +} + +int64x2_t vqdmull_s32(int32x2_t a, int32x2_t b); // VQDMULL.S32 q0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x2_t vqdmull_s32(int32x2_t a, int32x2_t b),_NEON2SSE_REASON_SLOW_SERIAL) +{ + //the serial soulution may be faster due to saturation + __m128i res; + res = vmull_s32(a,b); + return vqaddq_s64(res,res); //slow serial function!!!! +} + +//********************* Vector multiply accumulate: vmla -> Vr[i] := Va[i] + Vb[i] * Vc[i] ************************ +//****************************************************************************************** +int8x8_t vmla_s8(int8x8_t a, int8x8_t b, int8x8_t c); // VMLA.I8 d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vmla_s8(int8x8_t a, int8x8_t b, int8x8_t c) // VMLA.I8 d0,d0,d0 +{ + // no 8 bit x86 simd multiply, need to go to 16 bits, and use the low 64 bits + int8x8_t res64; + __m128i b128, c128, res; + _NEON2SSE_ALIGN_16 int8_t mask8_16_even_odd[16] = { 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15 }; + b128 = _MM_CVTEPI8_EPI16 (_pM128i(b)); // SSE 4.1 use low 64 bits + c128 = _MM_CVTEPI8_EPI16 (_pM128i(c)); // SSE 4.1 use low 64 bits + res = _mm_mullo_epi16 (c128, b128); + res = _mm_shuffle_epi8 (res, *(__m128i*) mask8_16_even_odd); + res = _mm_add_epi8 (res, _pM128i(a)); //use the low 64 bits + return64(res); +} + +int16x4_t vmla_s16(int16x4_t a, int16x4_t b, int16x4_t c); // VMLA.I16 d0,d0,d0 +_NEON2SSE_INLINE int16x4_t vmla_s16(int16x4_t a, int16x4_t b, int16x4_t c) +{ + int16x4_t res64; + return64(vmlaq_s16(_pM128i(a),_pM128i(b), _pM128i(c))); +} + + +int32x2_t vmla_s32(int32x2_t a, int32x2_t b, int32x2_t c); // VMLA.I32 d0,d0,d0 +_NEON2SSE_INLINE int32x2_t vmla_s32(int32x2_t a, int32x2_t b, int32x2_t c) // VMLA.I32 d0,d0,d0 +{ + int32x2_t res64; + __m128i res; + res = _MM_MULLO_EPI32 (_pM128i(b), _pM128i(c)); //SSE4.1 + res = _mm_add_epi32 (res, _pM128i(a)); //use the low 64 bits + return64(res); +} + +float32x2_t vmla_f32(float32x2_t a, float32x2_t b, float32x2_t c); // VMLA.F32 d0,d0,d0 +_NEON2SSE_INLINE float32x2_t vmla_f32(float32x2_t a, float32x2_t b, float32x2_t c) +{ + //fma is coming soon, but right now: + __m128 res; + __m64_128 res64; + res = _mm_mul_ps (_pM128(c), _pM128(b)); + res = _mm_add_ps (_pM128(a), res); + _M64f(res64, res); + return res64; +} + +uint8x8_t vmla_u8(uint8x8_t a, uint8x8_t b, uint8x8_t c); // VMLA.I8 d0,d0,d0 +_NEON2SSE_INLINE uint8x8_t vmla_u8(uint8x8_t a, uint8x8_t b, uint8x8_t c) // VMLA.I8 d0,d0,d0 +{ + // no 8 bit x86 simd multiply, need to go to 16 bits, and use the low 64 bits + uint8x8_t res64; + __m128i mask, b128, c128, res; + mask = _mm_set1_epi16(0xff); + b128 = _MM_CVTEPU8_EPI16 (_pM128i(b)); // SSE 4.1 use low 64 bits + c128 = _MM_CVTEPU8_EPI16 (_pM128i(c)); // SSE 4.1 use low 64 bits + res = _mm_mullo_epi16 (c128, b128); + res = _mm_and_si128(res, mask); //to avoid saturation + res = _mm_packus_epi16 (res, res); + res = _mm_add_epi8 (res, _pM128i(a)); //use the low 64 bits + return64(res); +} + +uint16x4_t vmla_u16(uint16x4_t a, uint16x4_t b, uint16x4_t c); // VMLA.I16 d0,d0,d0 +#define vmla_u16 vmla_s16 + +uint32x2_t vmla_u32(uint32x2_t a, uint32x2_t b, uint32x2_t c); // VMLA.I32 d0,d0,d0 +#define vmla_u32 vmla_s32 + +int8x16_t vmlaq_s8(int8x16_t a, int8x16_t b, int8x16_t c); // VMLA.I8 q0,q0,q0 +_NEON2SSE_INLINE int8x16_t vmlaq_s8(int8x16_t a, int8x16_t b, int8x16_t c) // VMLA.I8 q0,q0,q0 +{ + //solution may be not optimal + // no 8 bit simd multiply, need to go to 16 bits + __m128i b16, c16, r16_1, a_2,r16_2; + _NEON2SSE_ALIGN_16 int8_t mask8_16_even_odd[16] = { 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15 }; + b16 = _MM_CVTEPI8_EPI16 (b); // SSE 4.1 + c16 = _MM_CVTEPI8_EPI16 (c); // SSE 4.1 + r16_1 = _mm_mullo_epi16 (b16, c16); + r16_1 = _mm_shuffle_epi8 (r16_1, *(__m128i*) mask8_16_even_odd); //return to 8 bits + r16_1 = _mm_add_epi8 (r16_1, a); + //swap hi and low part of a, b and c to process the remaining data + a_2 = _mm_shuffle_epi32 (a, _SWAP_HI_LOW32); + b16 = _mm_shuffle_epi32 (b, _SWAP_HI_LOW32); + c16 = _mm_shuffle_epi32 (c, _SWAP_HI_LOW32); + b16 = _MM_CVTEPI8_EPI16 (b16); // SSE 4.1 + c16 = _MM_CVTEPI8_EPI16 (c16); // SSE 4.1 + + r16_2 = _mm_mullo_epi16 (b16, c16); + r16_2 = _mm_shuffle_epi8 (r16_2, *(__m128i*) mask8_16_even_odd); + r16_2 = _mm_add_epi8(r16_2, a_2); + return _mm_unpacklo_epi64(r16_1,r16_2); +} + +int16x8_t vmlaq_s16(int16x8_t a, int16x8_t b, int16x8_t c); // VMLA.I16 q0,q0,q0 +_NEON2SSE_INLINE int16x8_t vmlaq_s16(int16x8_t a, int16x8_t b, int16x8_t c) // VMLA.I16 q0,q0,q0 +{ + __m128i res; + res = _mm_mullo_epi16 (c, b); + return _mm_add_epi16 (res, a); +} + +int32x4_t vmlaq_s32(int32x4_t a, int32x4_t b, int32x4_t c); // VMLA.I32 q0,q0,q0 +_NEON2SSE_INLINE int32x4_t vmlaq_s32(int32x4_t a, int32x4_t b, int32x4_t c) // VMLA.I32 q0,q0,q0 +{ + __m128i res; + res = _MM_MULLO_EPI32 (c, b); //SSE4.1 + return _mm_add_epi32 (res, a); +} + +float32x4_t vmlaq_f32(float32x4_t a, float32x4_t b, float32x4_t c); // VMLA.F32 q0,q0,q0 +_NEON2SSE_INLINE float32x4_t vmlaq_f32(float32x4_t a, float32x4_t b, float32x4_t c) // VMLA.F32 q0,q0,q0 +{ + //fma is coming soon, but right now: + __m128 res; + res = _mm_mul_ps (c, b); + return _mm_add_ps (a, res); +} + +uint8x16_t vmlaq_u8(uint8x16_t a, uint8x16_t b, uint8x16_t c); // VMLA.I8 q0,q0,q0 +_NEON2SSE_INLINE uint8x16_t vmlaq_u8(uint8x16_t a, uint8x16_t b, uint8x16_t c) // VMLA.I8 q0,q0,q0 +{ + //solution may be not optimal + // no 8 bit simd multiply, need to go to 16 bits + __m128i b16, c16, r16_1, a_2, r16_2; + _NEON2SSE_ALIGN_16 int8_t mask8_16_even_odd[16] = { 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15 }; + b16 = _MM_CVTEPU8_EPI16 (b); // SSE 4.1 + c16 = _MM_CVTEPU8_EPI16 (c); // SSE 4.1 + r16_1 = _mm_mullo_epi16 (b16, c16); + r16_1 = _mm_shuffle_epi8 (r16_1, *(__m128i*) mask8_16_even_odd); //return to 8 bits + r16_1 = _mm_add_epi8 (r16_1, a); + //swap hi and low part of a, b and c to process the remaining data + a_2 = _mm_shuffle_epi32 (a, _SWAP_HI_LOW32); + b16 = _mm_shuffle_epi32 (b, _SWAP_HI_LOW32); + c16 = _mm_shuffle_epi32 (c, _SWAP_HI_LOW32); + b16 = _MM_CVTEPU8_EPI16 (b16); // SSE 4.1 + c16 = _MM_CVTEPU8_EPI16 (c16); // SSE 4.1 + + r16_2 = _mm_mullo_epi16 (b16, c16); + r16_2 = _mm_shuffle_epi8 (r16_2, *(__m128i*) mask8_16_even_odd); + r16_2 = _mm_add_epi8(r16_2, a_2); + return _mm_unpacklo_epi64(r16_1,r16_2); +} + +uint16x8_t vmlaq_u16(uint16x8_t a, uint16x8_t b, uint16x8_t c); // VMLA.I16 q0,q0,q0 +#define vmlaq_u16 vmlaq_s16 + +uint32x4_t vmlaq_u32(uint32x4_t a, uint32x4_t b, uint32x4_t c); // VMLA.I32 q0,q0,q0 +#define vmlaq_u32 vmlaq_s32 + +//********************** Vector widening multiply accumulate (long multiply accumulate): +// vmla -> Vr[i] := Va[i] + Vb[i] * Vc[i] ************** +//******************************************************************************************** +int16x8_t vmlal_s8(int16x8_t a, int8x8_t b, int8x8_t c); // VMLAL.S8 q0,d0,d0 +_NEON2SSE_INLINE int16x8_t vmlal_s8(int16x8_t a, int8x8_t b, int8x8_t c) // VMLAL.S8 q0,d0,d0 +{ + int16x8_t res; + res = vmull_s8(b, c); + return _mm_add_epi16 (res, a); +} + +int32x4_t vmlal_s16(int32x4_t a, int16x4_t b, int16x4_t c); // VMLAL.S16 q0,d0,d0 +_NEON2SSE_INLINE int32x4_t vmlal_s16(int32x4_t a, int16x4_t b, int16x4_t c) // VMLAL.S16 q0,d0,d0 +{ + //may be not optimal compared with serial implementation + int32x4_t res; + res = vmull_s16(b, c); + return _mm_add_epi32 (res, a); +} + +int64x2_t vmlal_s32(int64x2_t a, int32x2_t b, int32x2_t c); // VMLAL.S32 q0,d0,d0 +_NEON2SSE_INLINE int64x2_t vmlal_s32(int64x2_t a, int32x2_t b, int32x2_t c) // VMLAL.S32 q0,d0,d0 +{ + //may be not optimal compared with serial implementation + int64x2_t res; + res = vmull_s32( b, c); + return _mm_add_epi64 (res, a); +} + +uint16x8_t vmlal_u8(uint16x8_t a, uint8x8_t b, uint8x8_t c); // VMLAL.U8 q0,d0,d0 +_NEON2SSE_INLINE uint16x8_t vmlal_u8(uint16x8_t a, uint8x8_t b, uint8x8_t c) // VMLAL.U8 q0,d0,d0 +{ + uint16x8_t res; + res = vmull_u8(b, c); + return _mm_add_epi16 (res, a); +} + +uint32x4_t vmlal_u16(uint32x4_t a, uint16x4_t b, uint16x4_t c); // VMLAL.s16 q0,d0,d0 +_NEON2SSE_INLINE uint32x4_t vmlal_u16(uint32x4_t a, uint16x4_t b, uint16x4_t c) // VMLAL.s16 q0,d0,d0 +{ + //may be not optimal compared with serial implementation + uint32x4_t res; + res = vmull_u16(b, c); + return _mm_add_epi32 (res, a); +} + +uint64x2_t vmlal_u32(uint64x2_t a, uint32x2_t b, uint32x2_t c); // VMLAL.U32 q0,d0,d0 +_NEON2SSE_INLINE uint64x2_t vmlal_u32(uint64x2_t a, uint32x2_t b, uint32x2_t c) // VMLAL.U32 q0,d0,d0 +{ + //may be not optimal compared with serial implementation + int64x2_t res; + res = vmull_u32( b,c); + return _mm_add_epi64 (res, a); +} + +//******************** Vector multiply subtract: vmls -> Vr[i] := Va[i] - Vb[i] * Vc[i] *************************************** +//******************************************************************************************** +int8x8_t vmls_s8(int8x8_t a, int8x8_t b, int8x8_t c); // VMLS.I8 d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vmls_s8(int8x8_t a, int8x8_t b, int8x8_t c) // VMLS.I8 d0,d0,d0 +{ + // no 8 bit simd multiply, need to go to 16 bits - and use the low 64 bits + int8x8_t res64; + __m128i res; + res64 = vmul_s8(b,c); + res = _mm_sub_epi8 (_pM128i(a), _pM128i(res64)); + return64(res); +} + +int16x4_t vmls_s16(int16x4_t a, int16x4_t b, int16x4_t c); // VMLS.I16 d0,d0,d0 +_NEON2SSE_INLINE int16x4_t vmls_s16(int16x4_t a, int16x4_t b, int16x4_t c) +{ + int16x4_t res64; + return64(vmlsq_s16(_pM128i(a),_pM128i(b), _pM128i(c))); +} + + +int32x2_t vmls_s32(int32x2_t a, int32x2_t b, int32x2_t c); // VMLS.I32 d0,d0,d0 +_NEON2SSE_INLINE int32x2_t vmls_s32(int32x2_t a, int32x2_t b, int32x2_t c) // VMLS.I32 d0,d0,d0 +{ + int32x2_t res64; + __m128i res; + res = _MM_MULLO_EPI32 (_pM128i(c),_pM128i( b)); //SSE4.1 + res = _mm_sub_epi32 (_pM128i(a),res); //use low 64 bits only + return64(res); +} + +float32x2_t vmls_f32(float32x2_t a, float32x2_t b, float32x2_t c); // VMLS.F32 d0,d0,d0 +_NEON2SSE_INLINE float32x2_t vmls_f32(float32x2_t a, float32x2_t b, float32x2_t c) +{ + __m128 res; + __m64_128 res64; + res = _mm_mul_ps (_pM128(c), _pM128(b)); + res = _mm_sub_ps (_pM128(a), res); + _M64f(res64, res); + return res64; +} + +uint8x8_t vmls_u8(uint8x8_t a, uint8x8_t b, uint8x8_t c); // VMLS.I8 d0,d0,d0 +_NEON2SSE_INLINE uint8x8_t vmls_u8(uint8x8_t a, uint8x8_t b, uint8x8_t c) +{ + // no 8 bit simd multiply, need to go to 16 bits - and use the low 64 bits + uint8x8_t res64; + __m128i res; + res64 = vmul_u8(b,c); + res = _mm_sub_epi8 (_pM128i(a), _pM128i(res64)); + return64(res); +} + +uint16x4_t vmls_u16(uint16x4_t a, uint16x4_t b, uint16x4_t c); // VMLS.I16 d0,d0,d0 +#define vmls_u16 vmls_s16 + +uint32x2_t vmls_u32(uint32x2_t a, uint32x2_t b, uint32x2_t c); // VMLS.I32 d0,d0,d0 +#define vmls_u32 vmls_s32 + + +int8x16_t vmlsq_s8(int8x16_t a, int8x16_t b, int8x16_t c); // VMLS.I8 q0,q0,q0 +_NEON2SSE_INLINE int8x16_t vmlsq_s8(int8x16_t a, int8x16_t b, int8x16_t c) // VMLS.I8 q0,q0,q0 +{ + //solution may be not optimal + // no 8 bit simd multiply, need to go to 16 bits + __m128i b16, c16, r16_1, a_2, r16_2; + _NEON2SSE_ALIGN_16 int8_t mask8_16_even_odd[16] = { 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15 }; + b16 = _MM_CVTEPI8_EPI16 (b); // SSE 4.1 + c16 = _MM_CVTEPI8_EPI16 (c); // SSE 4.1 + r16_1 = _mm_mullo_epi16 (b16, c16); + r16_1 = _mm_shuffle_epi8 (r16_1, *(__m128i*) mask8_16_even_odd); + r16_1 = _mm_sub_epi8 (a, r16_1); + //swap hi and low part of a, b, c to process the remaining data + a_2 = _mm_shuffle_epi32 (a, _SWAP_HI_LOW32); + b16 = _mm_shuffle_epi32 (b, _SWAP_HI_LOW32); + c16 = _mm_shuffle_epi32 (c, _SWAP_HI_LOW32); + b16 = _MM_CVTEPI8_EPI16 (b16); // SSE 4.1 + c16 = _MM_CVTEPI8_EPI16 (c16); // SSE 4.1 + + r16_2 = _mm_mullo_epi16 (b16, c16); + r16_2 = _mm_shuffle_epi8 (r16_2, *(__m128i*) mask8_16_even_odd); + r16_2 = _mm_sub_epi8 (a_2, r16_2); + return _mm_unpacklo_epi64(r16_1,r16_2); +} + +int16x8_t vmlsq_s16(int16x8_t a, int16x8_t b, int16x8_t c); // VMLS.I16 q0,q0,q0 +_NEON2SSE_INLINE int16x8_t vmlsq_s16(int16x8_t a, int16x8_t b, int16x8_t c) // VMLS.I16 q0,q0,q0 +{ + __m128i res; + res = _mm_mullo_epi16 (c, b); + return _mm_sub_epi16 (a, res); +} + +int32x4_t vmlsq_s32(int32x4_t a, int32x4_t b, int32x4_t c); // VMLS.I32 q0,q0,q0 +_NEON2SSE_INLINE int32x4_t vmlsq_s32(int32x4_t a, int32x4_t b, int32x4_t c) // VMLS.I32 q0,q0,q0 +{ + __m128i res; + res = _MM_MULLO_EPI32 (c, b); //SSE4.1 + return _mm_sub_epi32 (a, res); +} + +float32x4_t vmlsq_f32(float32x4_t a, float32x4_t b, float32x4_t c); // VMLS.F32 q0,q0,q0 +_NEON2SSE_INLINE float32x4_t vmlsq_f32(float32x4_t a, float32x4_t b, float32x4_t c) // VMLS.F32 q0,q0,q0 +{ + __m128 res; + res = _mm_mul_ps (c, b); + return _mm_sub_ps (a, res); +} + +uint8x16_t vmlsq_u8(uint8x16_t a, uint8x16_t b, uint8x16_t c); // VMLS.I8 q0,q0,q0 +_NEON2SSE_INLINE uint8x16_t vmlsq_u8(uint8x16_t a, uint8x16_t b, uint8x16_t c) // VMLS.I8 q0,q0,q0 +{ + //solution may be not optimal + // no 8 bit simd multiply, need to go to 16 bits + __m128i b16, c16, r16_1, a_2, r16_2; + _NEON2SSE_ALIGN_16 int8_t mask8_16_even_odd[16] = { 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15 }; + b16 = _MM_CVTEPU8_EPI16 (b); // SSE 4.1 + c16 = _MM_CVTEPU8_EPI16 (c); // SSE 4.1 + r16_1 = _mm_mullo_epi16 (b16, c16); + r16_1 = _mm_shuffle_epi8 (r16_1, *(__m128i*) mask8_16_even_odd); //return to 8 bits + r16_1 = _mm_sub_epi8 (a, r16_1); + //swap hi and low part of a, b and c to process the remaining data + a_2 = _mm_shuffle_epi32 (a, _SWAP_HI_LOW32); + b16 = _mm_shuffle_epi32 (b, _SWAP_HI_LOW32); + c16 = _mm_shuffle_epi32 (c, _SWAP_HI_LOW32); + b16 = _MM_CVTEPU8_EPI16 (b16); // SSE 4.1 + c16 = _MM_CVTEPU8_EPI16 (c16); // SSE 4.1 + + r16_2 = _mm_mullo_epi16 (b16, c16); + r16_2 = _mm_shuffle_epi8 (r16_2, *(__m128i*) mask8_16_even_odd); + r16_2 = _mm_sub_epi8(a_2, r16_2); + return _mm_unpacklo_epi64(r16_1,r16_2); +} + +uint16x8_t vmlsq_u16(uint16x8_t a, uint16x8_t b, uint16x8_t c); // VMLS.I16 q0,q0,q0 +#define vmlsq_u16 vmlsq_s16 + +uint32x4_t vmlsq_u32(uint32x4_t a, uint32x4_t b, uint32x4_t c); // VMLS.I32 q0,q0,q0 +#define vmlsq_u32 vmlsq_s32 + +//******************** Vector multiply subtract long (widening multiply subtract) ************************************ +//************************************************************************************************************* +int16x8_t vmlsl_s8(int16x8_t a, int8x8_t b, int8x8_t c); // VMLSL.S8 q0,d0,d0 +_NEON2SSE_INLINE int16x8_t vmlsl_s8(int16x8_t a, int8x8_t b, int8x8_t c) // VMLSL.S8 q0,d0,d0 +{ + int16x8_t res; + res = vmull_s8(b, c); + return _mm_sub_epi16 (a, res); +} + +int32x4_t vmlsl_s16(int32x4_t a, int16x4_t b, int16x4_t c); // VMLSL.S16 q0,d0,d0 +_NEON2SSE_INLINE int32x4_t vmlsl_s16(int32x4_t a, int16x4_t b, int16x4_t c) // VMLSL.S16 q0,d0,d0 +{ + //may be not optimal compared with serial implementation + int32x4_t res; + res = vmull_s16(b, c); + return _mm_sub_epi32 (a, res); +} + +int64x2_t vmlsl_s32(int64x2_t a, int32x2_t b, int32x2_t c); // VMLSL.S32 q0,d0,d0 +_NEON2SSE_INLINE int64x2_t vmlsl_s32(int64x2_t a, int32x2_t b, int32x2_t c) // VMLSL.S32 q0,d0,d0 +{ + //may be not optimal compared with serial implementation + int64x2_t res; + res = vmull_s32( b,c); + return _mm_sub_epi64 (a, res); +} + +uint16x8_t vmlsl_u8(uint16x8_t a, uint8x8_t b, uint8x8_t c); // VMLSL.U8 q0,d0,d0 +_NEON2SSE_INLINE uint16x8_t vmlsl_u8(uint16x8_t a, uint8x8_t b, uint8x8_t c) // VMLSL.U8 q0,d0,d0 +{ + uint16x8_t res; + res = vmull_u8(b, c); + return _mm_sub_epi16 (a, res); +} + +uint32x4_t vmlsl_u16(uint32x4_t a, uint16x4_t b, uint16x4_t c); // VMLSL.s16 q0,d0,d0 +_NEON2SSE_INLINE uint32x4_t vmlsl_u16(uint32x4_t a, uint16x4_t b, uint16x4_t c) // VMLSL.s16 q0,d0,d0 +{ + //may be not optimal compared with serial implementation + uint32x4_t res; + res = vmull_u16(b, c); + return _mm_sub_epi32 (a, res); +} + +uint64x2_t vmlsl_u32(uint64x2_t a, uint32x2_t b, uint32x2_t c); // VMLSL.U32 q0,d0,d0 +_NEON2SSE_INLINE uint64x2_t vmlsl_u32(uint64x2_t a, uint32x2_t b, uint32x2_t c) // VMLSL.U32 q0,d0,d0 +{ + //may be not optimal compared with serial implementation + int64x2_t res; + res = vmull_u32( b,c); + return _mm_sub_epi64 (a, res); +} + +//****** Vector saturating doubling multiply high ********************** +//************************************************************************* +int16x4_t vqdmulh_s16(int16x4_t a, int16x4_t b); // VQDMULH.S16 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int16x4_t vqdmulh_s16(int16x4_t a, int16x4_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + int16x4_t res; + int32_t a32, b32, i; + for (i = 0; i<4; i++) { + a32 = (int32_t) a.m64_i16[i]; + b32 = (int32_t) b.m64_i16[i]; + a32 = (a32 * b32) >> 15; + res.m64_i16[i] = (a32 == 0x8000) ? 0x7fff : (int16_t) a32; + } + return res; +} + +int32x2_t vqdmulh_s32(int32x2_t a, int32x2_t b); // VQDMULH.S32 d0,d0,d0 +_NEON2SSE_INLINE int32x2_t vqdmulh_s32(int32x2_t a, int32x2_t b) // no multiply high 32 bit SIMD in IA32, so need to do some tricks, serial solution may be faster +{ + //may be not optimal compared with a serial solution + int32x2_t res64; + __m128i mask; + _NEON2SSE_ALIGN_16 uint32_t cmask32[] = {0x80000000, 0x80000000, 0x80000000, 0x80000000}; + int64x2_t mul; + mul = vmull_s32(a,b); + mul = _mm_slli_epi64(mul,1); //double the result + //at this point start treating 2 64-bit numbers as 4 32-bit + mul = _mm_shuffle_epi32 (mul, 1 | (3 << 2) | (0 << 4) | (2 << 6)); //shuffle the data to get 2 32-bits + mask = _mm_cmpeq_epi32 (mul, *(__m128i*)cmask32); + mul = _mm_xor_si128 (mul, mask); //res saturated for 0x80000000 + return64(mul); +} + +int16x8_t vqdmulhq_s16(int16x8_t a, int16x8_t b); // VQDMULH.S16 q0,q0,q0 +_NEON2SSE_INLINE int16x8_t vqdmulhq_s16(int16x8_t a, int16x8_t b) // VQDMULH.S16 q0,q0,q0 +{ + __m128i res, res_lo, mask; + _NEON2SSE_ALIGN_16 uint16_t cmask[] = {0x8000, 0x8000, 0x8000, 0x8000, 0x8000, 0x8000, 0x8000, 0x8000}; + res = _mm_mulhi_epi16 (a, b); + res = _mm_slli_epi16 (res, 1); //double the result, don't care about saturation + res_lo = _mm_mullo_epi16 (a, b); + res_lo = _mm_srli_epi16(res_lo,15); //take the highest bit + res = _mm_add_epi16(res, res_lo); //combine results + mask = _mm_cmpeq_epi16 (res, *(__m128i*)cmask); + return _mm_xor_si128 (res, mask); //res saturated for 0x8000 +} + +int32x4_t vqdmulhq_s32(int32x4_t a, int32x4_t b); // VQDMULH.S32 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x4_t vqdmulhq_s32(int32x4_t a, int32x4_t b), _NEON2SSE_REASON_SLOW_UNEFFECTIVE) +{ + // no multiply high 32 bit SIMD in IA32, may be not optimal compared with a serial solution for the SSSE3 target + __m128i ab, ba, mask, mul, mul1; + _NEON2SSE_ALIGN_16 uint32_t cmask32[] = {0x80000000, 0x80000000, 0x80000000, 0x80000000}; + ab = _mm_unpacklo_epi32 (a, b); //a0, b0, a1,b1 + ba = _mm_unpacklo_epi32 (b, a); //b0, a0, b1,a1 + mul = _MM_MUL_EPI32(ab, ba); //uses 1rst and 3rd data lanes, the multiplication gives 64 bit result + mul = _mm_slli_epi64(mul,1); //double the result + ab = _mm_unpackhi_epi32 (a, b); //a2, b2, a3,b3 + ba = _mm_unpackhi_epi32 (b, a); //b2, a2, b3,a3 + mul1 = _MM_MUL_EPI32(ab, ba); //uses 1rst and 3rd data lanes, the multiplication gives 64 bit result + mul1 = _mm_slli_epi64(mul1,1); //double the result + mul = _mm_shuffle_epi32 (mul, 1 | (3 << 2) | (0 << 4) | (2 << 6)); //shuffle the data to get 2 32-bits + mul1 = _mm_shuffle_epi32 (mul1, 1 | (3 << 2) | (0 << 4) | (2 << 6)); //shuffle the data to get 2 32-bits + mul = _mm_unpacklo_epi64(mul, mul1); + mask = _mm_cmpeq_epi32 (mul, *(__m128i*)cmask32); + return _mm_xor_si128 (mul, mask); //res saturated for 0x80000000 +} + +//********* Vector saturating rounding doubling multiply high **************** +//**************************************************************************** +//If use _mm_mulhrs_xx functions the result may differ from NEON one a little due to different rounding rules and order +int16x4_t vqrdmulh_s16(int16x4_t a, int16x4_t b); // VQRDMULH.S16 d0,d0,d0 +_NEON2SSE_INLINE int16x4_t vqrdmulh_s16(int16x4_t a, int16x4_t b) +{ + int16x4_t res64; + return64(vqrdmulhq_s16(_pM128i(a), _pM128i(b))); +} + +int32x2_t vqrdmulh_s32(int32x2_t a, int32x2_t b); // VQRDMULH.S32 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x2_t vqrdmulh_s32(int32x2_t a, int32x2_t b), _NEON2SSE_REASON_SLOW_UNEFFECTIVE) +{ + //may be not optimal compared with a serial solution + int32x2_t res64; + _NEON2SSE_ALIGN_16 uint32_t cmask32[] = {0x80000000, 0x80000000, 0x80000000, 0x80000000}; + __m128i res_sat, mask, mask1; + int64x2_t mul; + mul = vmull_s32(a,b); + res_sat = _mm_slli_epi64 (mul, 1); //double the result, saturation not considered + mask1 = _mm_slli_epi64(res_sat, 32); //shift left then back right to + mask1 = _mm_srli_epi64(mask1,31); //get 31-th bit 1 or zero + mul = _mm_add_epi32 (res_sat, mask1); //actual rounding + //at this point start treating 2 64-bit numbers as 4 32-bit + mul = _mm_shuffle_epi32 (mul, 1 | (3 << 2) | (0 << 4) | (2 << 6)); //shuffle the data to get 2 32-bits from each 64-bit + mask = _mm_cmpeq_epi32 (mul, *(__m128i*)cmask32); + mul = _mm_xor_si128 (mul, mask); //res saturated for 0x80000000 + return64(mul); +} + +int16x8_t vqrdmulhq_s16(int16x8_t a, int16x8_t b); // VQRDMULH.S16 q0,q0,q0 +_NEON2SSE_INLINE int16x8_t vqrdmulhq_s16(int16x8_t a, int16x8_t b) // VQRDMULH.S16 q0,q0,q0 +{ + __m128i mask, res; + _NEON2SSE_ALIGN_16 uint16_t cmask[] = {0x8000, 0x8000, 0x8000, 0x8000, 0x8000, 0x8000, 0x8000, 0x8000}; + res = _mm_mulhrs_epi16 (a, b); + mask = _mm_cmpeq_epi16 (res, *(__m128i*)cmask); + return _mm_xor_si128 (res, mask); //res saturated for 0x8000 +} + +int32x4_t vqrdmulhq_s32(int32x4_t a, int32x4_t b); // VQRDMULH.S32 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x4_t vqrdmulhq_s32(int32x4_t a, int32x4_t b), _NEON2SSE_REASON_SLOW_UNEFFECTIVE) +{ + // no multiply high 32 bit SIMD in IA32, may be not optimal compared with a serial solution for the SSSE3 target + __m128i ab, ba, mask, mul, mul1, mask1; + _NEON2SSE_ALIGN_16 uint32_t cmask32[] = {0x80000000, 0x80000000, 0x80000000, 0x80000000}; + ab = _mm_unpacklo_epi32 (a, b); //a0, b0, a1,b1 + ba = _mm_unpacklo_epi32 (b, a); //b0, a0, b1,a1 + mul = _MM_MUL_EPI32(ab, ba); //uses 1rst and 3rd data lanes, the multiplication gives 64 bit result + mul = _mm_slli_epi64 (mul, 1); //double the result, saturation not considered + mask1 = _mm_slli_epi64(mul, 32); //shift left then back right to + mask1 = _mm_srli_epi64(mask1,31); //get 31-th bit 1 or zero + mul = _mm_add_epi32 (mul, mask1); //actual rounding + + ab = _mm_unpackhi_epi32 (a, b); //a2, b2, a3,b3 + ba = _mm_unpackhi_epi32 (b, a); //b2, a2, b3,a3 + mul1 = _MM_MUL_EPI32(ab, ba); //uses 1rst and 3rd data lanes, the multiplication gives 64 bit result + mul1 = _mm_slli_epi64 (mul1, 1); //double the result, saturation not considered + mask1 = _mm_slli_epi64(mul1, 32); //shift left then back right to + mask1 = _mm_srli_epi64(mask1,31); //get 31-th bit 1 or zero + mul1 = _mm_add_epi32 (mul1, mask1); //actual rounding + //at this point start treating 2 64-bit numbers as 4 32-bit + mul = _mm_shuffle_epi32 (mul, 1 | (3 << 2) | (0 << 4) | (2 << 6)); //shuffle the data to get 2 32-bits from each 64-bit + mul1 = _mm_shuffle_epi32 (mul1, 1 | (3 << 2) | (0 << 4) | (2 << 6)); //shuffle the data to get 2 32-bits from each 64-bit + mul = _mm_unpacklo_epi64(mul, mul1); + mask = _mm_cmpeq_epi32 (mul, *(__m128i*)cmask32); + return _mm_xor_si128 (mul, mask); //res saturated for 0x80000000 +} + +//*************Vector widening saturating doubling multiply accumulate (long saturating doubling multiply accumulate) ***** +//************************************************************************************************************************* +int32x4_t vqdmlal_s16(int32x4_t a, int16x4_t b, int16x4_t c); // VQDMLAL.S16 q0,d0,d0 +_NEON2SSE_INLINE int32x4_t vqdmlal_s16(int32x4_t a, int16x4_t b, int16x4_t c) // VQDMLAL.S16 q0,d0,d0 +{ + //not optimal SIMD soulution, serial may be faster + __m128i res32; + res32 = vmull_s16(b, c); + res32 = vqd_s32(res32); //doubling & saturation ,if no saturation we could use _mm_slli_epi32 (res, 1); + return vqaddq_s32(res32, a); //saturation +} + +int64x2_t vqdmlal_s32(int64x2_t a, int32x2_t b, int32x2_t c); // VQDMLAL.S32 q0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x2_t vqdmlal_s32(int64x2_t a, int32x2_t b, int32x2_t c),_NEON2SSE_REASON_SLOW_SERIAL) +{ + __m128i res64; + res64 = vmull_s32(b,c); + res64 = vqaddq_s64(res64, res64); //doubling & saturation ,if no saturation we could use _mm_slli_epi64 (res, 1); + return vqaddq_s64(res64, a); //saturation +} + +//************************************************************************************ +//****************** Vector subtract *********************************************** +//************************************************************************************ +int8x8_t vsub_s8(int8x8_t a, int8x8_t b); // VSUB.I8 d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vsub_s8(int8x8_t a, int8x8_t b) +{ + int8x8_t res64; + return64(_mm_sub_epi8(_pM128i(a),_pM128i(b))); +} + + +int16x4_t vsub_s16(int16x4_t a, int16x4_t b); // VSUB.I16 d0,d0,d0 +_NEON2SSE_INLINE int16x4_t vsub_s16(int16x4_t a, int16x4_t b) +{ + int16x4_t res64; + return64(_mm_sub_epi16(_pM128i(a),_pM128i(b))); +} + + +int32x2_t vsub_s32(int32x2_t a, int32x2_t b); // VSUB.I32 d0,d0,d0 +_NEON2SSE_INLINE int32x2_t vsub_s32(int32x2_t a, int32x2_t b) +{ + int32x2_t res64; + return64(_mm_sub_epi32(_pM128i(a),_pM128i(b))); +} + + +int64x1_t vsub_s64(int64x1_t a, int64x1_t b); // VSUB.I64 d0,d0,d0 +_NEON2SSE_INLINE int64x1_t vsub_s64(int64x1_t a, int64x1_t b) +{ + int64x1_t res64; + res64.m64_i64[0] = a.m64_i64[0] - b.m64_i64[0]; + return res64; +} + + +float32x2_t vsub_f32(float32x2_t a, float32x2_t b); // VSUB.F32 d0,d0,d0 +_NEON2SSE_INLINE float32x2_t vsub_f32(float32x2_t a, float32x2_t b) +{ + float32x2_t res; + res.m64_f32[0] = a.m64_f32[0] - b.m64_f32[0]; + res.m64_f32[1] = a.m64_f32[1] - b.m64_f32[1]; + return res; +} + +uint8x8_t vsub_u8(uint8x8_t a, uint8x8_t b); // VSUB.I8 d0,d0,d0 +#define vsub_u8 vsub_s8 + +uint16x4_t vsub_u16(uint16x4_t a, uint16x4_t b); // VSUB.I16 d0,d0,d0 +#define vsub_u16 vsub_s16 + +uint32x2_t vsub_u32(uint32x2_t a, uint32x2_t b); // VSUB.I32 d0,d0,d0 +#define vsub_u32 vsub_s32 + + +uint64x1_t vsub_u64(uint64x1_t a, uint64x1_t b); // VSUB.I64 d0,d0,d0 +_NEON2SSE_INLINE uint64x1_t vsub_u64(uint64x1_t a, uint64x1_t b) +{ + int64x1_t res64; + res64.m64_u64[0] = a.m64_u64[0] - b.m64_u64[0]; + return res64; +} + + +int8x16_t vsubq_s8(int8x16_t a, int8x16_t b); // VSUB.I8 q0,q0,q0 +#define vsubq_s8 _mm_sub_epi8 + +int16x8_t vsubq_s16(int16x8_t a, int16x8_t b); // VSUB.I16 q0,q0,q0 +#define vsubq_s16 _mm_sub_epi16 + +int32x4_t vsubq_s32(int32x4_t a, int32x4_t b); // VSUB.I32 q0,q0,q0 +#define vsubq_s32 _mm_sub_epi32 + +int64x2_t vsubq_s64(int64x2_t a, int64x2_t b); // VSUB.I64 q0,q0,q0 +#define vsubq_s64 _mm_sub_epi64 + +float32x4_t vsubq_f32(float32x4_t a, float32x4_t b); // VSUB.F32 q0,q0,q0 +#define vsubq_f32 _mm_sub_ps + +uint8x16_t vsubq_u8(uint8x16_t a, uint8x16_t b); // VSUB.I8 q0,q0,q0 +#define vsubq_u8 _mm_sub_epi8 + +uint16x8_t vsubq_u16(uint16x8_t a, uint16x8_t b); // VSUB.I16 q0,q0,q0 +#define vsubq_u16 _mm_sub_epi16 + +uint32x4_t vsubq_u32(uint32x4_t a, uint32x4_t b); // VSUB.I32 q0,q0,q0 +#define vsubq_u32 _mm_sub_epi32 + +uint64x2_t vsubq_u64(uint64x2_t a, uint64x2_t b); // VSUB.I64 q0,q0,q0 +#define vsubq_u64 _mm_sub_epi64 + +//***************Vector long subtract: vsub -> Vr[i]:=Va[i]-Vb[i] ****************** +//*********************************************************************************** +//Va, Vb have equal lane sizes, result is a 128 bit vector of lanes that are twice the width. +int16x8_t vsubl_s8(int8x8_t a, int8x8_t b); // VSUBL.S8 q0,d0,d0 +_NEON2SSE_INLINE int16x8_t vsubl_s8(int8x8_t a, int8x8_t b) // VSUBL.S8 q0,d0,d0 +{ + __m128i a16, b16; + a16 = _MM_CVTEPI8_EPI16 (_pM128i(a)); //SSE4.1, + b16 = _MM_CVTEPI8_EPI16 (_pM128i(b)); //SSE4.1, + return _mm_sub_epi16 (a16, b16); +} + +int32x4_t vsubl_s16(int16x4_t a, int16x4_t b); // VSUBL.S16 q0,d0,d0 +_NEON2SSE_INLINE int32x4_t vsubl_s16(int16x4_t a, int16x4_t b) // VSUBL.S16 q0,d0,d0 +{ + __m128i a32, b32; + a32 = _MM_CVTEPI16_EPI32 (_pM128i(a)); //SSE4.1 + b32 = _MM_CVTEPI16_EPI32 (_pM128i(b)); //SSE4.1, + return _mm_sub_epi32 (a32, b32); +} + +int64x2_t vsubl_s32(int32x2_t a, int32x2_t b); // VSUBL.S32 q0,d0,d0 +_NEON2SSE_INLINE int64x2_t vsubl_s32(int32x2_t a, int32x2_t b) // VSUBL.S32 q0,d0,d0 +{ + //may be not optimal + __m128i a64, b64; + a64 = _MM_CVTEPI32_EPI64 (_pM128i(a)); //SSE4.1 + b64 = _MM_CVTEPI32_EPI64 (_pM128i(b)); //SSE4.1, + return _mm_sub_epi64 (a64, b64); +} + +uint16x8_t vsubl_u8(uint8x8_t a, uint8x8_t b); // VSUBL.U8 q0,d0,d0 +_NEON2SSE_INLINE uint16x8_t vsubl_u8(uint8x8_t a, uint8x8_t b) // VSUBL.U8 q0,d0,d0 +{ + __m128i a16, b16; + a16 = _MM_CVTEPU8_EPI16 (_pM128i(a)); //SSE4.1, + b16 = _MM_CVTEPU8_EPI16 (_pM128i(b)); //SSE4.1, + return _mm_sub_epi16 (a16, b16); +} + +uint32x4_t vsubl_u16(uint16x4_t a, uint16x4_t b); // VSUBL.s16 q0,d0,d0 +_NEON2SSE_INLINE uint32x4_t vsubl_u16(uint16x4_t a, uint16x4_t b) // VSUBL.s16 q0,d0,d0 +{ + __m128i a32, b32; + a32 = _MM_CVTEPU16_EPI32 (_pM128i(a)); //SSE4.1 + b32 = _MM_CVTEPU16_EPI32 (_pM128i(b)); //SSE4.1, + return _mm_sub_epi32 (a32, b32); +} + +uint64x2_t vsubl_u32(uint32x2_t a, uint32x2_t b); // VSUBL.U32 q0,d0,d0 +_NEON2SSE_INLINE uint64x2_t vsubl_u32(uint32x2_t a, uint32x2_t b) // VSUBL.U32 q0,d0,d0 +{ + //may be not optimal + __m128i a64, b64; + a64 = _MM_CVTEPU32_EPI64 (_pM128i(a)); //SSE4.1 + b64 = _MM_CVTEPU32_EPI64 (_pM128i(b)); //SSE4.1, + return _mm_sub_epi64 (a64, b64); +} + +//***************** Vector wide subtract: vsub -> Vr[i]:=Va[i]-Vb[i] ********************************** +//***************************************************************************************************** +int16x8_t vsubw_s8(int16x8_t a, int8x8_t b); // VSUBW.S8 q0,q0,d0 +_NEON2SSE_INLINE int16x8_t vsubw_s8(int16x8_t a, int8x8_t b) // VSUBW.S8 q0,q0,d0 +{ + __m128i b16; + b16 = _MM_CVTEPI8_EPI16 (_pM128i(b)); //SSE4.1, + return _mm_sub_epi16 (a, b16); +} + +int32x4_t vsubw_s16(int32x4_t a, int16x4_t b); // VSUBW.S16 q0,q0,d0 +_NEON2SSE_INLINE int32x4_t vsubw_s16(int32x4_t a, int16x4_t b) // VSUBW.S16 q0,q0,d0 +{ + __m128i b32; + b32 = _MM_CVTEPI16_EPI32 (_pM128i(b)); //SSE4.1, + return _mm_sub_epi32 (a, b32); +} + +int64x2_t vsubw_s32(int64x2_t a, int32x2_t b); // VSUBW.S32 q0,q0,d0 +_NEON2SSE_INLINE int64x2_t vsubw_s32(int64x2_t a, int32x2_t b) // VSUBW.S32 q0,q0,d0 +{ + __m128i b64; + b64 = _MM_CVTEPI32_EPI64 (_pM128i(b)); //SSE4.1 + return _mm_sub_epi64 (a, b64); +} + +uint16x8_t vsubw_u8(uint16x8_t a, uint8x8_t b); // VSUBW.U8 q0,q0,d0 +_NEON2SSE_INLINE uint16x8_t vsubw_u8(uint16x8_t a, uint8x8_t b) // VSUBW.U8 q0,q0,d0 +{ + __m128i b16; + b16 = _MM_CVTEPU8_EPI16 (_pM128i(b)); //SSE4.1, + return _mm_sub_epi16 (a, b16); +} + +uint32x4_t vsubw_u16(uint32x4_t a, uint16x4_t b); // VSUBW.s16 q0,q0,d0 +_NEON2SSE_INLINE uint32x4_t vsubw_u16(uint32x4_t a, uint16x4_t b) // VSUBW.s16 q0,q0,d0 +{ + __m128i b32; + b32 = _MM_CVTEPU16_EPI32 (_pM128i(b)); //SSE4.1, + return _mm_sub_epi32 (a, b32); +} + +uint64x2_t vsubw_u32(uint64x2_t a, uint32x2_t b); // VSUBW.U32 q0,q0,d0 +_NEON2SSE_INLINE uint64x2_t vsubw_u32(uint64x2_t a, uint32x2_t b) // VSUBW.U32 q0,q0,d0 +{ + __m128i b64; + b64 = _MM_CVTEPU32_EPI64 (_pM128i(b)); //SSE4.1 + return _mm_sub_epi64 (a, b64); +} + +//************************Vector saturating subtract ********************************* +//************************************************************************************* +int8x8_t vqsub_s8(int8x8_t a, int8x8_t b); // VQSUB.S8 d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vqsub_s8(int8x8_t a, int8x8_t b) +{ + int8x8_t res64; + return64(_mm_subs_epi8(_pM128i(a),_pM128i(b))); +} + + +int16x4_t vqsub_s16(int16x4_t a, int16x4_t b); // VQSUB.S16 d0,d0,d0 +_NEON2SSE_INLINE int16x4_t vqsub_s16(int16x4_t a, int16x4_t b) +{ + int16x4_t res64; + return64(_mm_subs_epi16(_pM128i(a),_pM128i(b))); +} + + +int32x2_t vqsub_s32(int32x2_t a, int32x2_t b); // VQSUB.S32 d0,d0,d0 +_NEON2SSE_INLINE int32x2_t vqsub_s32(int32x2_t a, int32x2_t b) +{ + int32x2_t res64; + return64(vqsubq_s32(_pM128i(a), _pM128i(b))); +} + + +int64x1_t vqsub_s64(int64x1_t a, int64x1_t b); // VQSUB.S64 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x1_t vqsub_s64(int64x1_t a, int64x1_t b), _NEON2SSE_REASON_SLOW_SERIAL) //no optimal SIMD soulution +{ + uint64x1_t res; + uint64_t a64,b64; + a64 = a.m64_u64[0]; + b64 = b.m64_u64[0]; + res.m64_u64[0] = a64 - b64; + + a64 = (a64 >> 63) + (~_SIGNBIT64); + if ((int64_t)((a64 ^ b64) & (a64 ^ res.m64_u64[0])) < 0) { + res.m64_u64[0] = a64; + } + return res; +} + +uint8x8_t vqsub_u8(uint8x8_t a, uint8x8_t b); // VQSUB.U8 d0,d0,d0 +_NEON2SSE_INLINE uint8x8_t vqsub_u8(uint8x8_t a, uint8x8_t b) +{ + uint8x8_t res64; + return64(_mm_subs_epu8(_pM128i(a),_pM128i(b))); +} + + +uint16x4_t vqsub_u16(uint16x4_t a, uint16x4_t b); // VQSUB.s16 d0,d0,d0 +_NEON2SSE_INLINE uint16x4_t vqsub_u16(uint16x4_t a, uint16x4_t b) +{ + uint16x4_t res64; + return64(_mm_subs_epu16(_pM128i(a),_pM128i(b))); +} + + +uint32x2_t vqsub_u32(uint32x2_t a, uint32x2_t b); // VQSUB.U32 d0,d0,d0 +_NEON2SSE_INLINE uint32x2_t vqsub_u32(uint32x2_t a, uint32x2_t b) +{ + uint32x2_t res64; + return64(vqsubq_u32(_pM128i(a), _pM128i(b))); +} + + +uint64x1_t vqsub_u64(uint64x1_t a, uint64x1_t b); // VQSUB.U64 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint64x1_t vqsub_u64(uint64x1_t a, uint64x1_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + uint64x1_t res; + uint64_t a64, b64; + a64 = _Ui64(a); + b64 = _Ui64(b); + if (a64 > b64) { + res.m64_u64[0] = a64 - b64; + } else { + res.m64_u64[0] = 0; + } + return res; +} + +int8x16_t vqsubq_s8(int8x16_t a, int8x16_t b); // VQSUB.S8 q0,q0,q0 +#define vqsubq_s8 _mm_subs_epi8 + +int16x8_t vqsubq_s16(int16x8_t a, int16x8_t b); // VQSUB.S16 q0,q0,q0 +#define vqsubq_s16 _mm_subs_epi16 + +int32x4_t vqsubq_s32(int32x4_t a, int32x4_t b); // VQSUB.S32 q0,q0,q0 +_NEON2SSE_INLINE int32x4_t vqsubq_s32(int32x4_t a, int32x4_t b) +{ + //no corresponding x86 SIMD soulution, special tricks are necessary. The overflow is possible only if a and b have opposite signs and sub has opposite sign to a + __m128i c7fffffff, res, res_sat, res_xor_a, b_xor_a; + c7fffffff = _mm_set1_epi32(0x7fffffff); + res = _mm_sub_epi32(a, b); + res_sat = _mm_srli_epi32(a, 31); + res_sat = _mm_add_epi32(res_sat, c7fffffff); + res_xor_a = _mm_xor_si128(res, a); + b_xor_a = _mm_xor_si128(b, a); + res_xor_a = _mm_and_si128(b_xor_a, res_xor_a); + res_xor_a = _mm_srai_epi32(res_xor_a,31); //propagate the sigh bit, all ffff if <0 all ones otherwise + res_sat = _mm_and_si128(res_xor_a, res_sat); + res = _mm_andnot_si128(res_xor_a, res); + return _mm_or_si128(res, res_sat); +} + +int64x2_t vqsubq_s64(int64x2_t a, int64x2_t b); // VQSUB.S64 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x2_t vqsubq_s64(int64x2_t a, int64x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) //no optimal SIMD soulution +{ + _NEON2SSE_ALIGN_16 int64_t atmp[2], btmp[2]; + _NEON2SSE_ALIGN_16 uint64_t res[2]; + _mm_store_si128((__m128i*)atmp, a); + _mm_store_si128((__m128i*)btmp, b); + res[0] = atmp[0] - btmp[0]; + res[1] = atmp[1] - btmp[1]; + if (((res[0] ^ atmp[0]) & _SIGNBIT64) && ((atmp[0] ^ btmp[0]) & _SIGNBIT64)) { + res[0] = (atmp[0] >> 63) ^ ~_SIGNBIT64; + } + if (((res[1] ^ atmp[1]) & _SIGNBIT64) && ((atmp[1] ^ btmp[1]) & _SIGNBIT64)) { + res[1] = (atmp[1] >> 63) ^ ~_SIGNBIT64; + } + return _mm_load_si128((__m128i*)res); +} + +uint8x16_t vqsubq_u8(uint8x16_t a, uint8x16_t b); // VQSUB.U8 q0,q0,q0 +#define vqsubq_u8 _mm_subs_epu8 + +uint16x8_t vqsubq_u16(uint16x8_t a, uint16x8_t b); // VQSUB.s16 q0,q0,q0 +#define vqsubq_u16 _mm_subs_epu16 + +uint32x4_t vqsubq_u32(uint32x4_t a, uint32x4_t b); // VQSUB.U32 q0,q0,q0 +_NEON2SSE_INLINE uint32x4_t vqsubq_u32(uint32x4_t a, uint32x4_t b) // VQSUB.U32 q0,q0,q0 +{ + __m128i min, mask, sub; + min = _MM_MIN_EPU32(a, b); //SSE4.1 + mask = _mm_cmpeq_epi32 (min, b); + sub = _mm_sub_epi32 (a, b); + return _mm_and_si128 ( sub, mask); +} + +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint64x2_t vqsubq_u64(uint64x2_t a, uint64x2_t b), _NEON2SSE_REASON_SLOW_SERIAL); // VQSUB.U64 q0,q0,q0 +#ifdef USE_SSE4 + _NEON2SSE_INLINE uint64x2_t vqsubq_u64(uint64x2_t a, uint64x2_t b) + { + __m128i c80000000, subb, suba, cmp, sub; + c80000000 = _mm_set_epi32 (0x80000000, 0x0, 0x80000000, 0x0); + sub = _mm_sub_epi64 (a, b); + suba = _mm_sub_epi64 (a, c80000000); + subb = _mm_sub_epi64 (b, c80000000); + cmp = _mm_cmpgt_epi64 ( suba, subb); //no unsigned comparison, need to go to signed, SSE4.2!!! + return _mm_and_si128 (sub, cmp); //saturation + } +#else + _NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint64x2_t vqsubq_u64(uint64x2_t a, uint64x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) + { + _NEON2SSE_ALIGN_16 uint64_t atmp[2], btmp[2], res[2]; + _mm_store_si128((__m128i*)atmp, a); + _mm_store_si128((__m128i*)btmp, b); + res[0] = (atmp[0] > btmp[0]) ? atmp[0] - btmp[0] : 0; + res[1] = (atmp[1] > btmp[1]) ? atmp[1] - btmp[1] : 0; + return _mm_load_si128((__m128i*)(res)); + } +#endif + +//**********Vector halving subtract Vr[i]:=(Va[i]-Vb[i])>>1 ****************************************************** +//**************************************************************** +int8x8_t vhsub_s8(int8x8_t a, int8x8_t b); // VHSUB.S8 d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vhsub_s8(int8x8_t a, int8x8_t b) // VHSUB.S8 d0,d0,d0 +{ + //no 8 bit shift available, internal overflow is possible, so let's go to 16 bit, + int8x8_t res64; + __m128i r16; + int8x8_t r; + r = vsub_s8 (a, b); + r16 = _MM_CVTEPI8_EPI16 (_pM128i(r)); //SSE 4.1 + r16 = _mm_srai_epi16 (r16, 1); //SSE2 + r16 = _mm_packs_epi16 (r16,r16); //use low 64 bits + return64(r16); +} + +int16x4_t vhsub_s16(int16x4_t a, int16x4_t b); // VHSUB.S16 d0,d0,d0 +_NEON2SSE_INLINE int16x4_t vhsub_s16(int16x4_t a, int16x4_t b) +{ + int16x4_t res64; + return64(vhsubq_s16(_pM128i(a), _pM128i(b))); +} + + + +int32x2_t vhsub_s32(int32x2_t a, int32x2_t b); // VHSUB.S32 d0,d0,d0 +_NEON2SSE_INLINE int32x2_t vhsub_s32(int32x2_t a, int32x2_t b) +{ + int32x2_t res64; + return64(vhsubq_s32(_pM128i(a), _pM128i(b))); +} + + +uint8x8_t vhsub_u8(uint8x8_t a, uint8x8_t b); // VHSUB.U8 d0,d0,d0 +_NEON2SSE_INLINE uint8x8_t vhsub_u8(uint8x8_t a, uint8x8_t b) +{ + uint8x8_t res64; + return64(vhsubq_u8(_pM128i(a), _pM128i(b))); +} + +uint16x4_t vhsub_u16(uint16x4_t a, uint16x4_t b); // VHSUB.s16 d0,d0,d0 +_NEON2SSE_INLINE uint16x4_t vhsub_u16(uint16x4_t a, uint16x4_t b) +{ + uint16x4_t res64; + return64(vhsubq_u16(_pM128i(a), _pM128i(b))); +} + +uint32x2_t vhsub_u32(uint32x2_t a, uint32x2_t b); // VHSUB.U32 d0,d0,d0 +_NEON2SSE_INLINE uint32x2_t vhsub_u32(uint32x2_t a, uint32x2_t b) +{ + uint32x2_t res64; + return64(vhsubq_u32(_pM128i(a), _pM128i(b))); +} + +int8x16_t vhsubq_s8(int8x16_t a, int8x16_t b); // VHSUB.S8 q0,q0,q0 +_NEON2SSE_INLINE int8x16_t vhsubq_s8(int8x16_t a, int8x16_t b) // VHSUB.S8 q0,q0,q0 +{ + // //need to deal with the possibility of internal overflow + __m128i c128, au,bu; + c128 = _mm_set1_epi8 (128); + au = _mm_add_epi8( a, c128); + bu = _mm_add_epi8( b, c128); + return vhsubq_u8(au,bu); +} + +int16x8_t vhsubq_s16(int16x8_t a, int16x8_t b); // VHSUB.S16 q0,q0,q0 +_NEON2SSE_INLINE int16x8_t vhsubq_s16(int16x8_t a, int16x8_t b) // VHSUB.S16 q0,q0,q0 +{ + //need to deal with the possibility of internal overflow + __m128i c8000, au,bu; + c8000 = _mm_set1_epi16(0x8000); + au = _mm_add_epi16( a, c8000); + bu = _mm_add_epi16( b, c8000); + return vhsubq_u16(au,bu); +} + +int32x4_t vhsubq_s32(int32x4_t a, int32x4_t b); // VHSUB.S32 q0,q0,q0 +_NEON2SSE_INLINE int32x4_t vhsubq_s32(int32x4_t a, int32x4_t b) // VHSUB.S32 q0,q0,q0 +{ + //need to deal with the possibility of internal overflow + __m128i a2, b2,r, b_1; + a2 = _mm_srai_epi32 (a,1); + b2 = _mm_srai_epi32 (b,1); + r = _mm_sub_epi32 (a2, b2); + b_1 = _mm_andnot_si128(a, b); //!a and b + b_1 = _mm_slli_epi32 (b_1,31); + b_1 = _mm_srli_epi32 (b_1,31); //0 or 1, last b bit + return _mm_sub_epi32(r,b_1); +} + +uint8x16_t vhsubq_u8(uint8x16_t a, uint8x16_t b); // VHSUB.U8 q0,q0,q0 +_NEON2SSE_INLINE uint8x16_t vhsubq_u8(uint8x16_t a, uint8x16_t b) // VHSUB.U8 q0,q0,q0 +{ + __m128i avg; + avg = _mm_avg_epu8 (a, b); + return _mm_sub_epi8(a, avg); +} + +uint16x8_t vhsubq_u16(uint16x8_t a, uint16x8_t b); // VHSUB.s16 q0,q0,q0 +_NEON2SSE_INLINE uint16x8_t vhsubq_u16(uint16x8_t a, uint16x8_t b) // VHSUB.s16 q0,q0,q0 +{ + __m128i avg; + avg = _mm_avg_epu16 (a, b); + return _mm_sub_epi16(a, avg); +} + +uint32x4_t vhsubq_u32(uint32x4_t a, uint32x4_t b); // VHSUB.U32 q0,q0,q0 +_NEON2SSE_INLINE uint32x4_t vhsubq_u32(uint32x4_t a, uint32x4_t b) // VHSUB.U32 q0,q0,q0 +{ + //need to deal with the possibility of internal overflow + __m128i a2, b2,r, b_1; + a2 = _mm_srli_epi32 (a,1); + b2 = _mm_srli_epi32 (b,1); + r = _mm_sub_epi32 (a2, b2); + b_1 = _mm_andnot_si128(a, b); //!a and b + b_1 = _mm_slli_epi32 (b_1,31); + b_1 = _mm_srli_epi32 (b_1,31); //0 or 1, last b bit + return _mm_sub_epi32(r,b_1); +} + +//******* Vector subtract high half (truncated) ** ************ +//************************************************************ +int8x8_t vsubhn_s16(int16x8_t a, int16x8_t b); // VSUBHN.I16 d0,q0,q0 +_NEON2SSE_INLINE int8x8_t vsubhn_s16(int16x8_t a, int16x8_t b) // VSUBHN.I16 d0,q0,q0 +{ + int8x8_t res64; + __m128i sum, sum8; + sum = _mm_sub_epi16 (a, b); + sum8 = _mm_srai_epi16 (sum, 8); + sum8 = _mm_packs_epi16(sum8,sum8); + return64(sum8); +} + +int16x4_t vsubhn_s32(int32x4_t a, int32x4_t b); // VSUBHN.I32 d0,q0,q0 +_NEON2SSE_INLINE int16x4_t vsubhn_s32(int32x4_t a, int32x4_t b) // VSUBHN.I32 d0,q0,q0 +{ + int16x4_t res64; + __m128i sum, sum16; + sum = _mm_sub_epi32 (a, b); + sum16 = _mm_srai_epi32 (sum, 16); + sum16 = _mm_packs_epi32(sum16,sum16); + return64(sum16); +} + +int32x2_t vsubhn_s64(int64x2_t a, int64x2_t b); // VSUBHN.I64 d0,q0,q0 +_NEON2SSE_INLINE int32x2_t vsubhn_s64(int64x2_t a, int64x2_t b) +{ + int32x2_t res64; + __m128i sub; + sub = _mm_sub_epi64 (a, b); + sub = _mm_shuffle_epi32(sub, 1 | (3 << 2) | (0 << 4) | (2 << 6)); + return64(sub); +} + +uint8x8_t vsubhn_u16(uint16x8_t a, uint16x8_t b); // VSUBHN.I16 d0,q0,q0 +_NEON2SSE_INLINE uint8x8_t vsubhn_u16(uint16x8_t a, uint16x8_t b) // VSUBHN.I16 d0,q0,q0 +{ + uint8x8_t res64; + __m128i sum, sum8; + sum = _mm_sub_epi16 (a, b); + sum8 = _mm_srli_epi16 (sum, 8); + sum8 = _mm_packus_epi16(sum8,sum8); + return64(sum8); +} + +uint16x4_t vsubhn_u32(uint32x4_t a, uint32x4_t b); // VSUBHN.I32 d0,q0,q0 +_NEON2SSE_INLINE uint16x4_t vsubhn_u32(uint32x4_t a, uint32x4_t b) // VSUBHN.I32 d0,q0,q0 +{ + uint16x4_t res64; + __m128i sum, sum16; + sum = _mm_sub_epi32 (a, b); + sum16 = _mm_srli_epi32 (sum, 16); + sum16 = _MM_PACKUS1_EPI32(sum16); + return64(sum16); +} + +uint32x2_t vsubhn_u64(uint64x2_t a, uint64x2_t b); // VSUBHN.I64 d0,q0,q0 +#define vsubhn_u64 vsubhn_s64 + +//************ Vector rounding subtract high half ********************* +//********************************************************************* +int8x8_t vrsubhn_s16(int16x8_t a, int16x8_t b); // VRSUBHN.I16 d0,q0,q0 +_NEON2SSE_INLINE int8x8_t vrsubhn_s16(int16x8_t a, int16x8_t b) // VRSUBHN.I16 d0,q0,q0 +{ + int8x8_t res64; + __m128i sub, mask1; + sub = _mm_sub_epi16 (a, b); + mask1 = _mm_slli_epi16(sub, 9); //shift left then back right to + mask1 = _mm_srli_epi16(mask1, 15); //get 7-th bit 1 or zero + sub = _mm_srai_epi16 (sub, 8); //get high half + sub = _mm_add_epi16 (sub, mask1); //actual rounding + sub = _mm_packs_epi16 (sub, sub); + return64(sub); +} + +int16x4_t vrsubhn_s32(int32x4_t a, int32x4_t b); // VRSUBHN.I32 d0,q0,q0 +_NEON2SSE_INLINE int16x4_t vrsubhn_s32(int32x4_t a, int32x4_t b) // VRSUBHN.I32 d0,q0,q0 +{ + //SIMD may be not optimal, serial may be faster + int16x4_t res64; + __m128i sub, mask1; + sub = _mm_sub_epi32 (a, b); + mask1 = _mm_slli_epi32(sub, 17); //shift left then back right to + mask1 = _mm_srli_epi32(mask1,31); //get 15-th bit 1 or zero + sub = _mm_srai_epi32 (sub, 16); //get high half + sub = _mm_add_epi32 (sub, mask1); //actual rounding + sub = _mm_packs_epi32 (sub, sub); + return64(sub); +} + +int32x2_t vrsubhn_s64(int64x2_t a, int64x2_t b); // VRSUBHN.I64 d0,q0,q0 +_NEON2SSE_INLINE int32x2_t vrsubhn_s64(int64x2_t a, int64x2_t b) +{ + //SIMD may be not optimal, serial may be faster + int32x2_t res64; + __m128i sub, mask1; + sub = _mm_sub_epi64 (a, b); + mask1 = _mm_slli_epi64(sub, 33); //shift left then back right to + mask1 = _mm_srli_epi64(mask1,32); //get 31-th bit 1 or zero + sub = _mm_add_epi64 (sub, mask1); //actual high half rounding + sub = _mm_shuffle_epi32(sub, 1 | (3 << 2) | (0 << 4) | (2 << 6)); + return64(sub); +} + +uint8x8_t vrsubhn_u16(uint16x8_t a, uint16x8_t b); // VRSUBHN.I16 d0,q0,q0 +_NEON2SSE_INLINE uint8x8_t vrsubhn_u16(uint16x8_t a, uint16x8_t b) // VRSUBHN.I16 d0,q0,q0 +{ + uint8x8_t res64; + __m128i sub, mask1; + sub = _mm_sub_epi16 (a, b); + mask1 = _mm_slli_epi16(sub, 9); //shift left then back right to + mask1 = _mm_srli_epi16(mask1, 15); //get 7-th bit 1 or zero + sub = _mm_srai_epi16 (sub, 8); //get high half + sub = _mm_add_epi16 (sub, mask1); //actual rounding + sub = _mm_packus_epi16 (sub, sub); + return64(sub); +} + +uint16x4_t vrsubhn_u32(uint32x4_t a, uint32x4_t b); // VRSUBHN.I32 d0,q0,q0 +_NEON2SSE_INLINE uint16x4_t vrsubhn_u32(uint32x4_t a, uint32x4_t b) // VRSUBHN.I32 d0,q0,q0 +{ + //SIMD may be not optimal, serial may be faster + uint16x4_t res64; + __m128i sub, mask1; + sub = _mm_sub_epi32 (a, b); + mask1 = _mm_slli_epi32(sub, 17); //shift left then back right to + mask1 = _mm_srli_epi32(mask1,31); //get 15-th bit 1 or zero + sub = _mm_srai_epi32 (sub, 16); //get high half + sub = _mm_add_epi32 (sub, mask1); //actual rounding + sub = _MM_PACKUS1_EPI32 (sub); + return64(sub); +} + +uint32x2_t vrsubhn_u64(uint64x2_t a, uint64x2_t b); // VRSUBHN.I64 d0,q0,q0 +#define vrsubhn_u64 vrsubhn_s64 + +//*********** Vector saturating doubling multiply subtract long ******************** +//************************************************************************************ +int32x4_t vqdmlsl_s16(int32x4_t a, int16x4_t b, int16x4_t c); // VQDMLSL.S16 q0,d0,d0 +_NEON2SSE_INLINE int32x4_t vqdmlsl_s16(int32x4_t a, int16x4_t b, int16x4_t c) +{ + //not optimal SIMD soulution, serial may be faster + __m128i res32, mask; + int32x4_t res; + _NEON2SSE_ALIGN_16 uint32_t cmask[] = {0x80000000, 0x80000000, 0x80000000, 0x80000000}; + res = vmull_s16(b, c); + res32 = _mm_slli_epi32 (res, 1); //double the result, saturation not considered + mask = _mm_cmpeq_epi32 (res32, *(__m128i*)cmask); + res32 = _mm_xor_si128 (res32, mask); //res32 saturated for 0x80000000 + return vqsubq_s32(a, res32); //saturation +} + +int64x2_t vqdmlsl_s32(int64x2_t a, int32x2_t b, int32x2_t c); // VQDMLSL.S32 q0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x2_t vqdmlsl_s32(int64x2_t a, int32x2_t b, int32x2_t c), _NEON2SSE_REASON_SLOW_SERIAL) +{ + __m128i res64, mask; + int64x2_t res; + _NEON2SSE_ALIGN_16 uint64_t cmask[] = {0x8000000000000000, 0x8000000000000000}; + res = vmull_s32(b, c); + res64 = _mm_slli_epi64 (res, 1); //double the result, saturation not considered + mask = _MM_CMPEQ_EPI64 (res64, *(__m128i*)cmask); + res64 = _mm_xor_si128 (res64, mask); //res32 saturated for 0x80000000 + return vqsubq_s64(a, res64); //saturation +} + +//****************** COMPARISON *************************************** +//******************* Vector compare equal ************************************* +//**************************************************************************** +uint8x8_t vceq_s8(int8x8_t a, int8x8_t b); // VCEQ.I8 d0, d0, d0 +_NEON2SSE_INLINE int8x8_t vceq_s8(int8x8_t a, int8x8_t b) +{ + int8x8_t res64; + return64(_mm_cmpeq_epi8(_pM128i(a),_pM128i(b))); +} + + +uint16x4_t vceq_s16(int16x4_t a, int16x4_t b); // VCEQ.I16 d0, d0, d0 +_NEON2SSE_INLINE int16x4_t vceq_s16(int16x4_t a, int16x4_t b) +{ + int16x4_t res64; + return64(_mm_cmpeq_epi16(_pM128i(a),_pM128i(b))); +} + + +uint32x2_t vceq_s32(int32x2_t a, int32x2_t b); // VCEQ.I32 d0, d0, d0 +_NEON2SSE_INLINE int32x2_t vceq_s32(int32x2_t a, int32x2_t b) +{ + int32x2_t res64; + return64(_mm_cmpeq_epi32(_pM128i(a),_pM128i(b))); +} + + +uint32x2_t vceq_f32(float32x2_t a, float32x2_t b); // VCEQ.F32 d0, d0, d0 +_NEON2SSE_INLINE uint32x2_t vceq_f32(float32x2_t a, float32x2_t b) +{ + uint32x2_t res64; + __m128 res; + res = _mm_cmpeq_ps(_pM128(a), _pM128(b) ); + return64f(res); +} + +uint8x8_t vceq_u8(uint8x8_t a, uint8x8_t b); // VCEQ.I8 d0, d0, d0 +_NEON2SSE_INLINE uint8x8_t vceq_u8(uint8x8_t a, uint8x8_t b) +{ + uint8x8_t res64; + return64(_mm_cmpeq_epi8(_pM128i(a),_pM128i(b))); +} + + +uint16x4_t vceq_u16(uint16x4_t a, uint16x4_t b); // VCEQ.I16 d0, d0, d0 +_NEON2SSE_INLINE uint16x4_t vceq_u16(uint16x4_t a, uint16x4_t b) +{ + uint16x4_t res64; + return64(_mm_cmpeq_epi16(_pM128i(a),_pM128i(b))); +} + + +uint32x2_t vceq_u32(uint32x2_t a, uint32x2_t b); // VCEQ.I32 d0, d0, d0 +_NEON2SSE_INLINE uint32x2_t vceq_u32(uint32x2_t a, uint32x2_t b) +{ + uint32x2_t res64; + return64(_mm_cmpeq_epi32(_pM128i(a),_pM128i(b))); +} + + +uint8x8_t vceq_p8(poly8x8_t a, poly8x8_t b); // VCEQ.I8 d0, d0, d0 +#define vceq_p8 vceq_u8 + + +uint8x16_t vceqq_s8(int8x16_t a, int8x16_t b); // VCEQ.I8 q0, q0, q0 +#define vceqq_s8 _mm_cmpeq_epi8 + +uint16x8_t vceqq_s16(int16x8_t a, int16x8_t b); // VCEQ.I16 q0, q0, q0 +#define vceqq_s16 _mm_cmpeq_epi16 + +uint32x4_t vceqq_s32(int32x4_t a, int32x4_t b); // VCEQ.I32 q0, q0, q0 +#define vceqq_s32 _mm_cmpeq_epi32 + +uint32x4_t vceqq_f32(float32x4_t a, float32x4_t b); // VCEQ.F32 q0, q0, q0 +_NEON2SSE_INLINE uint32x4_t vceqq_f32(float32x4_t a, float32x4_t b) +{ + __m128 res; + res = _mm_cmpeq_ps(a,b); + return _M128i(res); +} + +uint8x16_t vceqq_u8(uint8x16_t a, uint8x16_t b); // VCEQ.I8 q0, q0, q0 +#define vceqq_u8 _mm_cmpeq_epi8 + +uint16x8_t vceqq_u16(uint16x8_t a, uint16x8_t b); // VCEQ.I16 q0, q0, q0 +#define vceqq_u16 _mm_cmpeq_epi16 + +uint32x4_t vceqq_u32(uint32x4_t a, uint32x4_t b); // VCEQ.I32 q0, q0, q0 +#define vceqq_u32 _mm_cmpeq_epi32 + +uint8x16_t vceqq_p8(poly8x16_t a, poly8x16_t b); // VCEQ.I8 q0, q0, q0 +#define vceqq_p8 _mm_cmpeq_epi8 + +//******************Vector compare greater-than or equal************************* +//******************************************************************************* +//in IA SIMD no greater-than-or-equal comparison for integers, +// there is greater-than available only, so we need the following tricks + +uint8x8_t vcge_s8(int8x8_t a, int8x8_t b); // VCGE.S8 d0, d0, d0 +_NEON2SSE_INLINE int8x8_t vcge_s8(int8x8_t a, int8x8_t b) +{ + int8x8_t res64; + return64(vcgeq_s8(_pM128i(a), _pM128i(b))); +} + + +uint16x4_t vcge_s16(int16x4_t a, int16x4_t b); // VCGE.S16 d0, d0, d0 +_NEON2SSE_INLINE int16x4_t vcge_s16(int16x4_t a, int16x4_t b) +{ + int16x4_t res64; + return64(vcgeq_s16(_pM128i(a), _pM128i(b))); +} + + +uint32x2_t vcge_s32(int32x2_t a, int32x2_t b); // VCGE.S32 d0, d0, d0 +_NEON2SSE_INLINE int32x2_t vcge_s32(int32x2_t a, int32x2_t b) +{ + int32x2_t res64; + return64(vcgeq_s32(_pM128i(a), _pM128i(b))); +} + + +uint32x2_t vcge_f32(float32x2_t a, float32x2_t b); // VCGE.F32 d0, d0, d0 +_NEON2SSE_INLINE uint32x2_t vcge_f32(float32x2_t a, float32x2_t b) +{ + uint32x2_t res64; + __m128 res; + res = _mm_cmpge_ps(_pM128(a),_pM128(b)); //use only 2 first entries + return64f(res); +} + +uint8x8_t vcge_u8(uint8x8_t a, uint8x8_t b); // VCGE.U8 d0, d0, d0 +_NEON2SSE_INLINE uint8x8_t vcge_u8(uint8x8_t a, uint8x8_t b) +{ + uint8x8_t res64; + return64(vcgeq_u8(_pM128i(a), _pM128i(b))); +} + + +uint16x4_t vcge_u16(uint16x4_t a, uint16x4_t b); // VCGE.s16 d0, d0, d0 +_NEON2SSE_INLINE uint16x4_t vcge_u16(uint16x4_t a, uint16x4_t b) +{ + uint16x4_t res64; + return64(vcgeq_u16(_pM128i(a), _pM128i(b))); +} + + +uint32x2_t vcge_u32(uint32x2_t a, uint32x2_t b); // VCGE.U32 d0, d0, d0 +_NEON2SSE_INLINE uint32x2_t vcge_u32(uint32x2_t a, uint32x2_t b) +{ + //serial solution looks faster + uint32x2_t res64; + return64(vcgeq_u32 (_pM128i(a), _pM128i(b))); +} + + + +uint8x16_t vcgeq_s8(int8x16_t a, int8x16_t b); // VCGE.S8 q0, q0, q0 +_NEON2SSE_INLINE uint8x16_t vcgeq_s8(int8x16_t a, int8x16_t b) // VCGE.S8 q0, q0, q0 +{ + __m128i m1, m2; + m1 = _mm_cmpgt_epi8 ( a, b); + m2 = _mm_cmpeq_epi8 ( a, b); + return _mm_or_si128 ( m1, m2); +} + +uint16x8_t vcgeq_s16(int16x8_t a, int16x8_t b); // VCGE.S16 q0, q0, q0 +_NEON2SSE_INLINE uint16x8_t vcgeq_s16(int16x8_t a, int16x8_t b) // VCGE.S16 q0, q0, q0 +{ + __m128i m1, m2; + m1 = _mm_cmpgt_epi16 ( a, b); + m2 = _mm_cmpeq_epi16 ( a, b); + return _mm_or_si128 ( m1,m2); +} + +uint32x4_t vcgeq_s32(int32x4_t a, int32x4_t b); // VCGE.S32 q0, q0, q0 +_NEON2SSE_INLINE uint32x4_t vcgeq_s32(int32x4_t a, int32x4_t b) // VCGE.S32 q0, q0, q0 +{ + __m128i m1, m2; + m1 = _mm_cmpgt_epi32 (a, b); + m2 = _mm_cmpeq_epi32 (a, b); + return _mm_or_si128 (m1, m2); +} + +uint32x4_t vcgeq_f32(float32x4_t a, float32x4_t b); // VCGE.F32 q0, q0, q0 +_NEON2SSE_INLINE uint32x4_t vcgeq_f32(float32x4_t a, float32x4_t b) +{ + __m128 res; + res = _mm_cmpge_ps(a,b); //use only 2 first entries + return *(__m128i*)&res; +} + +uint8x16_t vcgeq_u8(uint8x16_t a, uint8x16_t b); // VCGE.U8 q0, q0, q0 +_NEON2SSE_INLINE uint8x16_t vcgeq_u8(uint8x16_t a, uint8x16_t b) // VCGE.U8 q0, q0, q0 +{ + //no unsigned chars comparison, only signed available,so need the trick + #ifdef USE_SSE4 + __m128i cmp; + cmp = _mm_max_epu8(a, b); + return _mm_cmpeq_epi8(cmp, a); //a>=b + #else + __m128i c128, as, bs, m1, m2; + c128 = _mm_set1_epi8 (128); + as = _mm_sub_epi8( a, c128); + bs = _mm_sub_epi8( b, c128); + m1 = _mm_cmpgt_epi8( as, bs); + m2 = _mm_cmpeq_epi8 (as, bs); + return _mm_or_si128 ( m1, m2); + #endif +} + +uint16x8_t vcgeq_u16(uint16x8_t a, uint16x8_t b); // VCGE.s16 q0, q0, q0 +_NEON2SSE_INLINE uint16x8_t vcgeq_u16(uint16x8_t a, uint16x8_t b) // VCGE.s16 q0, q0, q0 +{ + //no unsigned shorts comparison, only signed available,so need the trick + #ifdef USE_SSE4 + __m128i cmp; + cmp = _mm_max_epu16(a, b); + return _mm_cmpeq_epi16(cmp, a); //a>=b + #else + __m128i c8000, as, bs, m1, m2; + c8000 = _mm_set1_epi16 (0x8000); + as = _mm_sub_epi16(a,c8000); + bs = _mm_sub_epi16(b,c8000); + m1 = _mm_cmpgt_epi16(as, bs); + m2 = _mm_cmpeq_epi16 (as, bs); + return _mm_or_si128 ( m1, m2); + #endif +} + +uint32x4_t vcgeq_u32(uint32x4_t a, uint32x4_t b); // VCGE.U32 q0, q0, q0 +_NEON2SSE_INLINE uint32x4_t vcgeq_u32(uint32x4_t a, uint32x4_t b) // VCGE.U32 q0, q0, q0 +{ + //no unsigned ints comparison, only signed available,so need the trick + #ifdef USE_SSE4 + __m128i cmp; + cmp = _mm_max_epu32(a, b); + return _mm_cmpeq_epi32(cmp, a); //a>=b + #else + //serial solution may be faster + __m128i c80000000, as, bs, m1, m2; + c80000000 = _mm_set1_epi32 (0x80000000); + as = _mm_sub_epi32(a,c80000000); + bs = _mm_sub_epi32(b,c80000000); + m1 = _mm_cmpgt_epi32 (as, bs); + m2 = _mm_cmpeq_epi32 (as, bs); + return _mm_or_si128 ( m1, m2); + #endif +} + +//**********************Vector compare less-than or equal****************************** +//*************************************************************************************** +//in IA SIMD no less-than-or-equal comparison for integers present, so we need the tricks + +uint8x8_t vcle_s8(int8x8_t a, int8x8_t b); // VCGE.S8 d0, d0, d0 +_NEON2SSE_INLINE int8x8_t vcle_s8(int8x8_t a, int8x8_t b) +{ + int8x8_t res64; + return64(vcleq_s8(_pM128i(a), _pM128i(b))); +} + + +uint16x4_t vcle_s16(int16x4_t a, int16x4_t b); // VCGE.S16 d0, d0, d0 +_NEON2SSE_INLINE int16x4_t vcle_s16(int16x4_t a, int16x4_t b) +{ + int16x4_t res64; + return64(vcleq_s16(_pM128i(a), _pM128i(b))); +} + + +uint32x2_t vcle_s32(int32x2_t a, int32x2_t b); // VCGE.S32 d0, d0, d0 +_NEON2SSE_INLINE int32x2_t vcle_s32(int32x2_t a, int32x2_t b) +{ + int32x2_t res64; + return64(vcleq_s32(_pM128i(a), _pM128i(b))); +} + + +uint32x2_t vcle_f32(float32x2_t a, float32x2_t b); // VCGE.F32 d0, d0, d0? +_NEON2SSE_INLINE uint32x2_t vcle_f32(float32x2_t a, float32x2_t b) +{ + uint32x2_t res64; + __m128 res; + res = _mm_cmple_ps(_pM128(a),_pM128(b)); + return64f(res); +} + +uint8x8_t vcle_u8(uint8x8_t a, uint8x8_t b); // VCGE.U8 d0, d0, d0 +#define vcle_u8(a,b) vcge_u8(b,a) + + +uint16x4_t vcle_u16(uint16x4_t a, uint16x4_t b); // VCGE.s16 d0, d0, d0 +#define vcle_u16(a,b) vcge_u16(b,a) + + +uint32x2_t vcle_u32(uint32x2_t a, uint32x2_t b); // VCGE.U32 d0, d0, d0 +#define vcle_u32(a,b) vcge_u32(b,a) + +uint8x16_t vcleq_s8(int8x16_t a, int8x16_t b); // VCGE.S8 q0, q0, q0 +_NEON2SSE_INLINE uint8x16_t vcleq_s8(int8x16_t a, int8x16_t b) // VCGE.S8 q0, q0, q0 +{ + __m128i c1, res; + c1 = _mm_cmpeq_epi8 (a,a); //all ones 0xff.... + res = _mm_cmpgt_epi8 ( a, b); + return _mm_andnot_si128 (res, c1); //inverse the cmpgt result, get less-than-or-equal +} + +uint16x8_t vcleq_s16(int16x8_t a, int16x8_t b); // VCGE.S16 q0, q0, q0 +_NEON2SSE_INLINE uint16x8_t vcleq_s16(int16x8_t a, int16x8_t b) // VCGE.S16 q0, q0, q0 +{ + __m128i c1, res; + c1 = _mm_cmpeq_epi16 (a,a); //all ones 0xff.... + res = _mm_cmpgt_epi16 ( a, b); + return _mm_andnot_si128 (res, c1); +} + +uint32x4_t vcleq_s32(int32x4_t a, int32x4_t b); // VCGE.S32 q0, q0, q0 +_NEON2SSE_INLINE uint32x4_t vcleq_s32(int32x4_t a, int32x4_t b) // VCGE.S32 q0, q0, q0 +{ + __m128i c1, res; + c1 = _mm_cmpeq_epi32 (a,a); //all ones 0xff.... + res = _mm_cmpgt_epi32 ( a, b); + return _mm_andnot_si128 (res, c1); +} + +uint32x4_t vcleq_f32(float32x4_t a, float32x4_t b); // VCGE.F32 q0, q0, q0 +_NEON2SSE_INLINE uint32x4_t vcleq_f32(float32x4_t a, float32x4_t b) +{ + __m128 res; + res = _mm_cmple_ps(a,b); + return *(__m128i*)&res; +} + +uint8x16_t vcleq_u8(uint8x16_t a, uint8x16_t b); // VCGE.U8 q0, q0, q0 +#ifdef USE_SSE4 + _NEON2SSE_INLINE uint8x16_t vcleq_u8(uint8x16_t a, uint8x16_t b) // VCGE.U8 q0, q0, q0 + { + //no unsigned chars comparison in SSE, only signed available,so need the trick + __m128i cmp; + cmp = _mm_min_epu8(a, b); + return _mm_cmpeq_epi8(cmp, a); //a<=b + } +#else + #define vcleq_u8(a,b) vcgeq_u8(b,a) +#endif + + +uint16x8_t vcleq_u16(uint16x8_t a, uint16x8_t b); // VCGE.s16 q0, q0, q0 +#ifdef USE_SSE4 + _NEON2SSE_INLINE uint16x8_t vcleq_u16(uint16x8_t a, uint16x8_t b) // VCGE.s16 q0, q0, q0 + { + //no unsigned shorts comparison in SSE, only signed available,so need the trick + __m128i cmp; + cmp = _mm_min_epu16(a, b); + return _mm_cmpeq_epi16(cmp, a); //a<=b + } +#else + #define vcleq_u16(a,b) vcgeq_u16(b,a) +#endif + + +uint32x4_t vcleq_u32(uint32x4_t a, uint32x4_t b); // VCGE.U32 q0, q0, q0 +#ifdef USE_SSE4 + _NEON2SSE_INLINE uint32x4_t vcleq_u32(uint32x4_t a, uint32x4_t b) // VCGE.U32 q0, q0, q0 + { + //no unsigned chars comparison in SSE, only signed available,so need the trick + __m128i cmp; + cmp = _mm_min_epu32(a, b); + return _mm_cmpeq_epi32(cmp, a); //a<=b + } +#else +//solution may be not optimal compared with the serial one + #define vcleq_u32(a,b) vcgeq_u32(b,a) +#endif + + +//****** Vector compare greater-than ****************************************** +//************************************************************************** +uint8x8_t vcgt_s8(int8x8_t a, int8x8_t b); // VCGT.S8 d0, d0, d0 +_NEON2SSE_INLINE int8x8_t vcgt_s8(int8x8_t a, int8x8_t b) +{ + int8x8_t res64; + return64(_mm_cmpgt_epi8(_pM128i(a),_pM128i(b))); +} + + +uint16x4_t vcgt_s16(int16x4_t a, int16x4_t b); // VCGT.S16 d0, d0, d0 +_NEON2SSE_INLINE int16x4_t vcgt_s16(int16x4_t a, int16x4_t b) +{ + int16x4_t res64; + return64(_mm_cmpgt_epi16(_pM128i(a),_pM128i(b))); +} + + +uint32x2_t vcgt_s32(int32x2_t a, int32x2_t b); // VCGT.S32 d0, d0, d0 +_NEON2SSE_INLINE int32x2_t vcgt_s32(int32x2_t a, int32x2_t b) +{ + int32x2_t res64; + return64(_mm_cmpgt_epi32(_pM128i(a),_pM128i(b))); +} + + +uint32x2_t vcgt_f32(float32x2_t a, float32x2_t b); // VCGT.F32 d0, d0, d0 +_NEON2SSE_INLINE uint32x2_t vcgt_f32(float32x2_t a, float32x2_t b) +{ + uint32x2_t res64; + __m128 res; + res = _mm_cmpgt_ps(_pM128(a),_pM128(b)); //use only 2 first entries + return64f(res); +} + +uint8x8_t vcgt_u8(uint8x8_t a, uint8x8_t b); // VCGT.U8 d0, d0, d0 +_NEON2SSE_INLINE uint8x8_t vcgt_u8(uint8x8_t a, uint8x8_t b) +{ + uint8x8_t res64; + return64(vcgtq_u8(_pM128i(a), _pM128i(b))); +} + + +uint16x4_t vcgt_u16(uint16x4_t a, uint16x4_t b); // VCGT.s16 d0, d0, d0 +_NEON2SSE_INLINE uint16x4_t vcgt_u16(uint16x4_t a, uint16x4_t b) +{ + uint16x4_t res64; + return64(vcgtq_u16(_pM128i(a), _pM128i(b))); +} + + +uint32x2_t vcgt_u32(uint32x2_t a, uint32x2_t b); // VCGT.U32 d0, d0, d0 +_NEON2SSE_INLINE uint32x2_t vcgt_u32(uint32x2_t a, uint32x2_t b) +{ + uint32x2_t res64; + return64(vcgtq_u32(_pM128i(a), _pM128i(b))); +} + + +uint8x16_t vcgtq_s8(int8x16_t a, int8x16_t b); // VCGT.S8 q0, q0, q0 +#define vcgtq_s8 _mm_cmpgt_epi8 + +uint16x8_t vcgtq_s16(int16x8_t a, int16x8_t b); // VCGT.S16 q0, q0, q0 +#define vcgtq_s16 _mm_cmpgt_epi16 + +uint32x4_t vcgtq_s32(int32x4_t a, int32x4_t b); // VCGT.S32 q0, q0, q0 +#define vcgtq_s32 _mm_cmpgt_epi32 + +uint32x4_t vcgtq_f32(float32x4_t a, float32x4_t b); // VCGT.F32 q0, q0, q0 +_NEON2SSE_INLINE uint32x4_t vcgtq_f32(float32x4_t a, float32x4_t b) +{ + __m128 res; + res = _mm_cmpgt_ps(a,b); //use only 2 first entries + return *(__m128i*)&res; +} + +uint8x16_t vcgtq_u8(uint8x16_t a, uint8x16_t b); // VCGT.U8 q0, q0, q0 +_NEON2SSE_INLINE uint8x16_t vcgtq_u8(uint8x16_t a, uint8x16_t b) // VCGT.U8 q0, q0, q0 +{ + //no unsigned chars comparison, only signed available,so need the trick + __m128i c128, as, bs; + c128 = _mm_set1_epi8 (128); + as = _mm_sub_epi8(a,c128); + bs = _mm_sub_epi8(b,c128); + return _mm_cmpgt_epi8 (as, bs); +} + +uint16x8_t vcgtq_u16(uint16x8_t a, uint16x8_t b); // VCGT.s16 q0, q0, q0 +_NEON2SSE_INLINE uint16x8_t vcgtq_u16(uint16x8_t a, uint16x8_t b) // VCGT.s16 q0, q0, q0 +{ + //no unsigned short comparison, only signed available,so need the trick + __m128i c8000, as, bs; + c8000 = _mm_set1_epi16 (0x8000); + as = _mm_sub_epi16(a,c8000); + bs = _mm_sub_epi16(b,c8000); + return _mm_cmpgt_epi16 ( as, bs); +} + +uint32x4_t vcgtq_u32(uint32x4_t a, uint32x4_t b); // VCGT.U32 q0, q0, q0 +_NEON2SSE_INLINE uint32x4_t vcgtq_u32(uint32x4_t a, uint32x4_t b) // VCGT.U32 q0, q0, q0 +{ + //no unsigned int comparison, only signed available,so need the trick + __m128i c80000000, as, bs; + c80000000 = _mm_set1_epi32 (0x80000000); + as = _mm_sub_epi32(a,c80000000); + bs = _mm_sub_epi32(b,c80000000); + return _mm_cmpgt_epi32 ( as, bs); +} + +//********************* Vector compare less-than ************************** +//************************************************************************* +uint8x8_t vclt_s8(int8x8_t a, int8x8_t b); // VCGT.S8 d0, d0, d0 +#define vclt_s8(a,b) vcgt_s8(b,a) //swap the arguments!! + + +uint16x4_t vclt_s16(int16x4_t a, int16x4_t b); // VCGT.S16 d0, d0, d0 +#define vclt_s16(a,b) vcgt_s16(b,a) //swap the arguments!! + + +uint32x2_t vclt_s32(int32x2_t a, int32x2_t b); // VCGT.S32 d0, d0, d0 +#define vclt_s32(a,b) vcgt_s32(b,a) //swap the arguments!! + + +uint32x2_t vclt_f32(float32x2_t a, float32x2_t b); // VCGT.F32 d0, d0, d0 +#define vclt_f32(a,b) vcgt_f32(b, a) //swap the arguments!! + +uint8x8_t vclt_u8(uint8x8_t a, uint8x8_t b); // VCGT.U8 d0, d0, d0 +#define vclt_u8(a,b) vcgt_u8(b,a) //swap the arguments!! + +uint16x4_t vclt_u16(uint16x4_t a, uint16x4_t b); // VCGT.s16 d0, d0, d0 +#define vclt_u16(a,b) vcgt_u16(b,a) //swap the arguments!! + +uint32x2_t vclt_u32(uint32x2_t a, uint32x2_t b); // VCGT.U32 d0, d0, d0 +#define vclt_u32(a,b) vcgt_u32(b,a) //swap the arguments!! + +uint8x16_t vcltq_s8(int8x16_t a, int8x16_t b); // VCGT.S8 q0, q0, q0 +#define vcltq_s8(a,b) vcgtq_s8(b, a) //swap the arguments!! + +uint16x8_t vcltq_s16(int16x8_t a, int16x8_t b); // VCGT.S16 q0, q0, q0 +#define vcltq_s16(a,b) vcgtq_s16(b, a) //swap the arguments!! + +uint32x4_t vcltq_s32(int32x4_t a, int32x4_t b); // VCGT.S32 q0, q0, q0 +#define vcltq_s32(a,b) vcgtq_s32(b, a) //swap the arguments!! + +uint32x4_t vcltq_f32(float32x4_t a, float32x4_t b); // VCGT.F32 q0, q0, q0 +#define vcltq_f32(a,b) vcgtq_f32(b, a) //swap the arguments!! + +uint8x16_t vcltq_u8(uint8x16_t a, uint8x16_t b); // VCGT.U8 q0, q0, q0 +#define vcltq_u8(a,b) vcgtq_u8(b, a) //swap the arguments!! + +uint16x8_t vcltq_u16(uint16x8_t a, uint16x8_t b); // VCGT.s16 q0, q0, q0 +#define vcltq_u16(a,b) vcgtq_u16(b, a) //swap the arguments!! + +uint32x4_t vcltq_u32(uint32x4_t a, uint32x4_t b); // VCGT.U32 q0, q0, q0 +#define vcltq_u32(a,b) vcgtq_u32(b, a) //swap the arguments!! + +//*****************Vector compare absolute greater-than or equal ************ +//*************************************************************************** +uint32x2_t vcage_f32(float32x2_t a, float32x2_t b); // VACGE.F32 d0, d0, d0 +_NEON2SSE_INLINE uint32x2_t vcage_f32(float32x2_t a, float32x2_t b) +{ + uint32x2_t res64; + __m128i c7fffffff; + __m128 a0, b0; + c7fffffff = _mm_set1_epi32 (0x7fffffff); + a0 = _mm_and_ps (_pM128(a), *(__m128*)&c7fffffff); + b0 = _mm_and_ps (_pM128(b), *(__m128*)&c7fffffff); + a0 = _mm_cmpge_ps ( a0, b0); + return64f(a0); +} + +uint32x4_t vcageq_f32(float32x4_t a, float32x4_t b); // VACGE.F32 q0, q0, q0 +_NEON2SSE_INLINE uint32x4_t vcageq_f32(float32x4_t a, float32x4_t b) // VACGE.F32 q0, q0, q0 +{ + __m128i c7fffffff; + __m128 a0, b0; + c7fffffff = _mm_set1_epi32 (0x7fffffff); + a0 = _mm_and_ps (a, *(__m128*)&c7fffffff); + b0 = _mm_and_ps (b, *(__m128*)&c7fffffff); + a0 = _mm_cmpge_ps ( a0, b0); + return (*(__m128i*)&a0); +} + +//********Vector compare absolute less-than or equal ****************** +//******************************************************************** +uint32x2_t vcale_f32(float32x2_t a, float32x2_t b); // VACGE.F32 d0, d0, d0 +_NEON2SSE_INLINE uint32x2_t vcale_f32(float32x2_t a, float32x2_t b) +{ + uint32x2_t res64; + __m128i c7fffffff; + __m128 a0, b0; + c7fffffff = _mm_set1_epi32 (0x7fffffff); + a0 = _mm_and_ps (_pM128(a), *(__m128*)&c7fffffff); + b0 = _mm_and_ps (_pM128(b), *(__m128*)&c7fffffff); + a0 = _mm_cmple_ps (a0, b0); + return64f(a0); +} + +uint32x4_t vcaleq_f32(float32x4_t a, float32x4_t b); // VACGE.F32 q0, q0, q0 +_NEON2SSE_INLINE uint32x4_t vcaleq_f32(float32x4_t a, float32x4_t b) // VACGE.F32 q0, q0, q0 +{ + __m128i c7fffffff; + __m128 a0, b0; + c7fffffff = _mm_set1_epi32 (0x7fffffff); + a0 = _mm_and_ps (a, *(__m128*)&c7fffffff); + b0 = _mm_and_ps (b, *(__m128*)&c7fffffff); + a0 = _mm_cmple_ps (a0, b0); + return (*(__m128i*)&a0); +} + +//******** Vector compare absolute greater-than ****************** +//****************************************************************** +uint32x2_t vcagt_f32(float32x2_t a, float32x2_t b); // VACGT.F32 d0, d0, d0 +_NEON2SSE_INLINE uint32x2_t vcagt_f32(float32x2_t a, float32x2_t b) +{ + uint32x2_t res64; + __m128i c7fffffff; + __m128 a0, b0; + c7fffffff = _mm_set1_epi32 (0x7fffffff); + a0 = _mm_and_ps (_pM128(a), *(__m128*)&c7fffffff); + b0 = _mm_and_ps (_pM128(b), *(__m128*)&c7fffffff); + a0 = _mm_cmpgt_ps (a0, b0); + return64f(a0); +} + +uint32x4_t vcagtq_f32(float32x4_t a, float32x4_t b); // VACGT.F32 q0, q0, q0 +_NEON2SSE_INLINE uint32x4_t vcagtq_f32(float32x4_t a, float32x4_t b) // VACGT.F32 q0, q0, q0 +{ + __m128i c7fffffff; + __m128 a0, b0; + c7fffffff = _mm_set1_epi32 (0x7fffffff); + a0 = _mm_and_ps (a, *(__m128*)&c7fffffff); + b0 = _mm_and_ps (b, *(__m128*)&c7fffffff); + a0 = _mm_cmpgt_ps (a0, b0); + return (*(__m128i*)&a0); +} + +//***************Vector compare absolute less-than *********************** +//************************************************************************* +uint32x2_t vcalt_f32(float32x2_t a, float32x2_t b); // VACGT.F32 d0, d0, d0 +_NEON2SSE_INLINE uint32x2_t vcalt_f32(float32x2_t a, float32x2_t b) +{ + uint32x2_t res64; + __m128i c7fffffff; + __m128 a0, b0; + c7fffffff = _mm_set1_epi32 (0x7fffffff); + a0 = _mm_and_ps (_pM128(a), *(__m128*)&c7fffffff); + b0 = _mm_and_ps (_pM128(b), *(__m128*)&c7fffffff); + a0 = _mm_cmplt_ps (a0, b0); + return64f(a0); +} + +uint32x4_t vcaltq_f32(float32x4_t a, float32x4_t b); // VACGT.F32 q0, q0, q0 +_NEON2SSE_INLINE uint32x4_t vcaltq_f32(float32x4_t a, float32x4_t b) // VACGT.F32 q0, q0, q0 +{ + __m128i c7fffffff; + __m128 a0, b0; + c7fffffff = _mm_set1_epi32 (0x7fffffff); + a0 = _mm_and_ps (a, *(__m128*)&c7fffffff); + b0 = _mm_and_ps (b, *(__m128*)&c7fffffff); + a0 = _mm_cmplt_ps (a0, b0); + return (*(__m128i*)&a0); +} + +//*************************Vector test bits************************************ +//***************************************************************************** +/*VTST (Vector Test Bits) takes each element in a vector, and bitwise logical ANDs them +with the corresponding element of a second vector. If the result is not zero, the +corresponding element in the destination vector is set to all ones. Otherwise, it is set to +all zeros. */ + +uint8x8_t vtst_s8(int8x8_t a, int8x8_t b); // VTST.8 d0, d0, d0 +_NEON2SSE_INLINE uint8x8_t vtst_s8(int8x8_t a, int8x8_t b) +{ + int8x8_t res64; + return64(vtstq_s8(_pM128i(a), _pM128i(b))); +} + + +uint16x4_t vtst_s16(int16x4_t a, int16x4_t b); // VTST.16 d0, d0, d0 +_NEON2SSE_INLINE uint16x4_t vtst_s16(int16x4_t a, int16x4_t b) +{ + int16x4_t res64; + return64(vtstq_s16(_pM128i(a), _pM128i(b))); +} + + +uint32x2_t vtst_s32(int32x2_t a, int32x2_t b); // VTST.32 d0, d0, d0 +_NEON2SSE_INLINE uint32x2_t vtst_s32(int32x2_t a, int32x2_t b) +{ + int32x2_t res64; + return64(vtstq_s32(_pM128i(a), _pM128i(b))); +} + + +uint8x8_t vtst_u8(uint8x8_t a, uint8x8_t b); // VTST.8 d0, d0, d0 +#define vtst_u8 vtst_s8 + +uint16x4_t vtst_u16(uint16x4_t a, uint16x4_t b); // VTST.16 d0, d0, d0 +#define vtst_u16 vtst_s16 + +uint32x2_t vtst_u32(uint32x2_t a, uint32x2_t b); // VTST.32 d0, d0, d0 +#define vtst_u32 vtst_s32 + + +uint8x8_t vtst_p8(poly8x8_t a, poly8x8_t b); // VTST.8 d0, d0, d0 +#define vtst_p8 vtst_u8 + +uint8x16_t vtstq_s8(int8x16_t a, int8x16_t b); // VTST.8 q0, q0, q0 +_NEON2SSE_INLINE uint8x16_t vtstq_s8(int8x16_t a, int8x16_t b) // VTST.8 q0, q0, q0 +{ + __m128i zero, one, res; + zero = _mm_setzero_si128 (); + one = _mm_cmpeq_epi8(zero,zero); //0xfff..ffff + res = _mm_and_si128 (a, b); + res = _mm_cmpeq_epi8 (res, zero); + return _mm_xor_si128(res, one); //invert result +} + +uint16x8_t vtstq_s16(int16x8_t a, int16x8_t b); // VTST.16 q0, q0, q0 +_NEON2SSE_INLINE uint16x8_t vtstq_s16(int16x8_t a, int16x8_t b) // VTST.16 q0, q0, q0 +{ + __m128i zero, one, res; + zero = _mm_setzero_si128 (); + one = _mm_cmpeq_epi8(zero,zero); //0xfff..ffff + res = _mm_and_si128 (a, b); + res = _mm_cmpeq_epi16 (res, zero); + return _mm_xor_si128(res, one); //invert result +} + +uint32x4_t vtstq_s32(int32x4_t a, int32x4_t b); // VTST.32 q0, q0, q0 +_NEON2SSE_INLINE uint32x4_t vtstq_s32(int32x4_t a, int32x4_t b) // VTST.32 q0, q0, q0 +{ + __m128i zero, one, res; + zero = _mm_setzero_si128 (); + one = _mm_cmpeq_epi8(zero,zero); //0xfff..ffff + res = _mm_and_si128 (a, b); + res = _mm_cmpeq_epi32 (res, zero); + return _mm_xor_si128(res, one); //invert result +} + +uint8x16_t vtstq_u8(uint8x16_t a, uint8x16_t b); // VTST.8 q0, q0, q0 +#define vtstq_u8 vtstq_s8 + +uint16x8_t vtstq_u16(uint16x8_t a, uint16x8_t b); // VTST.16 q0, q0, q0 +#define vtstq_u16 vtstq_s16 + +uint32x4_t vtstq_u32(uint32x4_t a, uint32x4_t b); // VTST.32 q0, q0, q0 +#define vtstq_u32 vtstq_s32 + +uint8x16_t vtstq_p8(poly8x16_t a, poly8x16_t b); // VTST.8 q0, q0, q0 +#define vtstq_p8 vtstq_u8 + +//****************** Absolute difference ******************** +//*** Absolute difference between the arguments: Vr[i] = | Va[i] - Vb[i] |***** +//************************************************************ +int8x8_t vabd_s8(int8x8_t a, int8x8_t b); // VABD.S8 d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vabd_s8(int8x8_t a, int8x8_t b) +{ + int8x8_t res64; + return64(vabdq_s8(_pM128i(a), _pM128i(b))); +} + +int16x4_t vabd_s16(int16x4_t a, int16x4_t b); // VABD.S16 d0,d0,d0 +_NEON2SSE_INLINE int16x4_t vabd_s16(int16x4_t a, int16x4_t b) +{ + int16x4_t res64; + return64(vabdq_s16(_pM128i(a), _pM128i(b))); +} + +int32x2_t vabd_s32(int32x2_t a, int32x2_t b); // VABD.S32 d0,d0,d0 +_NEON2SSE_INLINE int32x2_t vabd_s32(int32x2_t a, int32x2_t b) +{ + int32x2_t res64; + return64(vabdq_s32(_pM128i(a), _pM128i(b))); +} + +uint8x8_t vabd_u8(uint8x8_t a, uint8x8_t b); // VABD.U8 d0,d0,d0 +_NEON2SSE_INLINE uint8x8_t vabd_u8(uint8x8_t a, uint8x8_t b) +{ + uint8x8_t res64; + return64(vabdq_u8(_pM128i(a), _pM128i(b))); +} + +uint16x4_t vabd_u16(uint16x4_t a, uint16x4_t b); // VABD.s16 d0,d0,d0 +_NEON2SSE_INLINE uint16x4_t vabd_u16(uint16x4_t a, uint16x4_t b) +{ + uint16x4_t res64; + return64(vabdq_u16(_pM128i(a), _pM128i(b))); +} + +uint32x2_t vabd_u32(uint32x2_t a, uint32x2_t b); // VABD.U32 d0,d0,d0 +_NEON2SSE_INLINE uint32x2_t vabd_u32(uint32x2_t a, uint32x2_t b) +{ + uint32x2_t res64; + return64(vabdq_u32(_pM128i(a), _pM128i(b))); +} + +float32x2_t vabd_f32(float32x2_t a, float32x2_t b); // VABD.F32 d0,d0,d0 +_NEON2SSE_INLINE float32x2_t vabd_f32(float32x2_t a, float32x2_t b) +{ + float32x4_t res; + __m64_128 res64; + res = vabdq_f32(_pM128(a), _pM128(b)); + _M64f(res64, res); + return res64; +} + +int8x16_t vabdq_s8(int8x16_t a, int8x16_t b); // VABD.S8 q0,q0,q0 +_NEON2SSE_INLINE int8x16_t vabdq_s8(int8x16_t a, int8x16_t b) // VABD.S8 q0,q0,q0 +{ + __m128i res; + res = _mm_sub_epi8 (a, b); + return _mm_abs_epi8 (res); +} + +int16x8_t vabdq_s16(int16x8_t a, int16x8_t b); // VABD.S16 q0,q0,q0 +_NEON2SSE_INLINE int16x8_t vabdq_s16(int16x8_t a, int16x8_t b) // VABD.S16 q0,q0,q0 +{ + __m128i res; + res = _mm_sub_epi16 (a,b); + return _mm_abs_epi16 (res); +} + +int32x4_t vabdq_s32(int32x4_t a, int32x4_t b); // VABD.S32 q0,q0,q0 +_NEON2SSE_INLINE int32x4_t vabdq_s32(int32x4_t a, int32x4_t b) // VABD.S32 q0,q0,q0 +{ + __m128i res; + res = _mm_sub_epi32 (a,b); + return _mm_abs_epi32 (res); +} + +uint8x16_t vabdq_u8(uint8x16_t a, uint8x16_t b); // VABD.U8 q0,q0,q0 +_NEON2SSE_INLINE uint8x16_t vabdq_u8(uint8x16_t a, uint8x16_t b) //no abs for unsigned +{ + __m128i cmp, difab, difba; + cmp = vcgtq_u8(a,b); + difab = _mm_sub_epi8(a,b); + difba = _mm_sub_epi8 (b,a); + difab = _mm_and_si128(cmp, difab); + difba = _mm_andnot_si128(cmp, difba); + return _mm_or_si128(difab, difba); +} + +uint16x8_t vabdq_u16(uint16x8_t a, uint16x8_t b); // VABD.s16 q0,q0,q0 +_NEON2SSE_INLINE uint16x8_t vabdq_u16(uint16x8_t a, uint16x8_t b) +{ + __m128i cmp, difab, difba; + cmp = vcgtq_u16(a,b); + difab = _mm_sub_epi16(a,b); + difba = _mm_sub_epi16 (b,a); + difab = _mm_and_si128(cmp, difab); + difba = _mm_andnot_si128(cmp, difba); + return _mm_or_si128(difab, difba); +} + +uint32x4_t vabdq_u32(uint32x4_t a, uint32x4_t b); // VABD.U32 q0,q0,q0 +_NEON2SSE_INLINE uint32x4_t vabdq_u32(uint32x4_t a, uint32x4_t b) +{ + __m128i cmp, difab, difba; + cmp = vcgtq_u32(a,b); + difab = _mm_sub_epi32(a,b); + difba = _mm_sub_epi32 (b,a); + difab = _mm_and_si128(cmp, difab); + difba = _mm_andnot_si128(cmp, difba); + return _mm_or_si128(difab, difba); +} + +float32x4_t vabdq_f32(float32x4_t a, float32x4_t b); // VABD.F32 q0,q0,q0 +_NEON2SSE_INLINE float32x4_t vabdq_f32(float32x4_t a, float32x4_t b) // VABD.F32 q0,q0,q0 +{ + __m128i c1; + __m128 res; + c1 = _mm_set1_epi32(0x7fffffff); + res = _mm_sub_ps (a, b); + return _mm_and_ps (res, *(__m128*)&c1); +} + +//************ Absolute difference - long ************************** +//******************************************************************** +int16x8_t vabdl_s8(int8x8_t a, int8x8_t b); // VABDL.S8 q0,d0,d0 +_NEON2SSE_INLINE int16x8_t vabdl_s8(int8x8_t a, int8x8_t b) // VABDL.S8 q0,d0,d0 +{ + __m128i a16, b16; + a16 = _MM_CVTEPI8_EPI16 (_pM128i(a)); //SSE4.1, + b16 = _MM_CVTEPI8_EPI16 (_pM128i(b)); //SSE4.1, + return vabdq_s16(a16, b16); + +} + +int32x4_t vabdl_s16(int16x4_t a, int16x4_t b); // VABDL.S16 q0,d0,d0 +_NEON2SSE_INLINE int32x4_t vabdl_s16(int16x4_t a, int16x4_t b) // VABDL.S16 q0,d0,d0 +{ + __m128i a32, b32; + a32 = _MM_CVTEPI16_EPI32 (_pM128i(a)); //SSE4.1 + b32 = _MM_CVTEPI16_EPI32 (_pM128i(b)); //SSE4.1, + return vabdq_s32(a32, b32); +} + +int64x2_t vabdl_s32(int32x2_t a, int32x2_t b); // VABDL.S32 q0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING (int64x2_t vabdl_s32(int32x2_t a, int32x2_t b),_NEON2SSE_REASON_SLOW_SERIAL) +{ + //no optimal SIMD solution, serial looks faster + _NEON2SSE_ALIGN_16 int64_t res[2]; + if(a.m64_i32[0] > b.m64_i32[0]) res[0] = ( int64_t) a.m64_i32[0] - ( int64_t) b.m64_i32[0]; + else res[0] = ( int64_t) b.m64_i32[0] - ( int64_t) a.m64_i32[0]; + if(a.m64_i32[1] > b.m64_i32[1]) res[1] = ( int64_t) a.m64_i32[1] - ( int64_t) b.m64_i32[1]; + else res[1] = ( int64_t) b.m64_i32[1] - ( int64_t) a.m64_i32[1]; + return _mm_load_si128((__m128i*)res); +} + +uint16x8_t vabdl_u8(uint8x8_t a, uint8x8_t b); // VABDL.U8 q0,d0,d0 +_NEON2SSE_INLINE uint16x8_t vabdl_u8(uint8x8_t a, uint8x8_t b) +{ + __m128i res; + res = vsubl_u8(a,b); + return _mm_abs_epi16(res); +} + +uint32x4_t vabdl_u16(uint16x4_t a, uint16x4_t b); // VABDL.s16 q0,d0,d0 +_NEON2SSE_INLINE uint32x4_t vabdl_u16(uint16x4_t a, uint16x4_t b) +{ + __m128i res; + res = vsubl_u16(a,b); + return _mm_abs_epi32(res); +} + +uint64x2_t vabdl_u32(uint32x2_t a, uint32x2_t b); // VABDL.U32 q0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING (uint64x2_t vabdl_u32(uint32x2_t a, uint32x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + _NEON2SSE_ALIGN_16 uint64_t res[2]; + if(a.m64_u32[0] > b.m64_u32[0]) res[0] = ( uint64_t) a.m64_u32[0] - ( uint64_t) b.m64_u32[0]; + else res[0] = ( uint64_t) b.m64_u32[0] - ( uint64_t) a.m64_u32[0]; + if(a.m64_u32[1] > b.m64_u32[1]) res[1] = ( uint64_t) a.m64_u32[1] - ( uint64_t) b.m64_u32[1]; + else res[1] = ( uint64_t) b.m64_u32[1] - ( uint64_t) a.m64_u32[1]; + return _mm_load_si128((__m128i*)res); +} + +//**********Absolute difference and accumulate: Vr[i] = Va[i] + | Vb[i] - Vc[i] | ************* +//********************************************************************************************* +int8x8_t vaba_s8(int8x8_t a, int8x8_t b, int8x8_t c); // VABA.S8 d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vaba_s8(int8x8_t a, int8x8_t b, int8x8_t c) +{ + int8x8_t res64; + return64(vabaq_s8(_pM128i(a),_pM128i(b), _pM128i(c))); +} + +int16x4_t vaba_s16(int16x4_t a, int16x4_t b, int16x4_t c); // VABA.S16 d0,d0,d0 +_NEON2SSE_INLINE int16x4_t vaba_s16(int16x4_t a, int16x4_t b, int16x4_t c) +{ + int16x4_t res64; + return64(vabaq_s16(_pM128i(a), _pM128i(b), _pM128i(c))); +} + +int32x2_t vaba_s32(int32x2_t a, int32x2_t b, int32x2_t c); // VABA.S32 d0,d0,d0 +_NEON2SSE_INLINE int32x2_t vaba_s32(int32x2_t a, int32x2_t b, int32x2_t c) +{ + int32x2_t res64; + return64(vabaq_s32(_pM128i(a), _pM128i(b), _pM128i(c))); +} + +uint8x8_t vaba_u8(uint8x8_t a, uint8x8_t b, uint8x8_t c); // VABA.U8 d0,d0,d0 +#define vaba_u8 vaba_s8 + + +uint16x4_t vaba_u16(uint16x4_t a, uint16x4_t b, uint16x4_t c); // VABA.s16 d0,d0,d0 +#define vaba_u16 vaba_s16 + +uint32x2_t vaba_u32(uint32x2_t a, uint32x2_t b, uint32x2_t c); // VABA.U32 d0,d0,d0 +_NEON2SSE_INLINE uint32x2_t vaba_u32(uint32x2_t a, uint32x2_t b, uint32x2_t c) +{ + uint32x2_t res64; + return64(vabaq_u32(_pM128i(a), _pM128i(b), _pM128i(c))); +} + +int8x16_t vabaq_s8(int8x16_t a, int8x16_t b, int8x16_t c); // VABA.S8 q0,q0,q0 +_NEON2SSE_INLINE int8x16_t vabaq_s8(int8x16_t a, int8x16_t b, int8x16_t c) // VABA.S8 q0,q0,q0 +{ + int8x16_t sub; + sub = vabdq_s8(b, c); + return vaddq_s8( a, sub); +} + +int16x8_t vabaq_s16(int16x8_t a, int16x8_t b, int16x8_t c); // VABA.S16 q0,q0,q0 +_NEON2SSE_INLINE int16x8_t vabaq_s16(int16x8_t a, int16x8_t b, int16x8_t c) // VABA.S16 q0,q0,q0 +{ + int16x8_t sub; + sub = vabdq_s16(b, c); + return vaddq_s16( a, sub); +} + +int32x4_t vabaq_s32(int32x4_t a, int32x4_t b, int32x4_t c); // VABA.S32 q0,q0,q0 +_NEON2SSE_INLINE int32x4_t vabaq_s32(int32x4_t a, int32x4_t b, int32x4_t c) // VABA.S32 q0,q0,q0 +{ + int32x4_t sub; + sub = vabdq_s32(b, c); + return vaddq_s32( a, sub); +} + +uint8x16_t vabaq_u8(uint8x16_t a, uint8x16_t b, uint8x16_t c); // VABA.U8 q0,q0,q0 +_NEON2SSE_INLINE uint8x16_t vabaq_u8(uint8x16_t a, uint8x16_t b, uint8x16_t c) +{ + uint8x16_t sub; + sub = vabdq_u8(b, c); + return vaddq_u8( a, sub); +} + +uint16x8_t vabaq_u16(uint16x8_t a, uint16x8_t b, uint16x8_t c); // VABA.s16 q0,q0,q0 +_NEON2SSE_INLINE uint16x8_t vabaq_u16(uint16x8_t a, uint16x8_t b, uint16x8_t c) +{ + uint16x8_t sub; + sub = vabdq_u16(b, c); + return vaddq_u16( a, sub); +} + +uint32x4_t vabaq_u32(uint32x4_t a, uint32x4_t b, uint32x4_t c); // VABA.U32 q0,q0,q0 +_NEON2SSE_INLINE uint32x4_t vabaq_u32(uint32x4_t a, uint32x4_t b, uint32x4_t c) +{ + uint32x4_t sub; + sub = vabdq_u32(b, c); + return vaddq_u32( a, sub); +} + +//************** Absolute difference and accumulate - long ******************************** +//************************************************************************************* +int16x8_t vabal_s8(int16x8_t a, int8x8_t b, int8x8_t c); // VABAL.S8 q0,d0,d0 +_NEON2SSE_INLINE int16x8_t vabal_s8(int16x8_t a, int8x8_t b, int8x8_t c) // VABAL.S8 q0,d0,d0 +{ + __m128i b16, c16, res; + b16 = _MM_CVTEPI8_EPI16 (_pM128i(b)); //SSE4.1, + c16 = _MM_CVTEPI8_EPI16 (_pM128i(c)); //SSE4.1, + res = _mm_abs_epi16 (_mm_sub_epi16 (b16, c16) ); + return _mm_add_epi16 (a, res); +} + +int32x4_t vabal_s16(int32x4_t a, int16x4_t b, int16x4_t c); // VABAL.S16 q0,d0,d0 +_NEON2SSE_INLINE int32x4_t vabal_s16(int32x4_t a, int16x4_t b, int16x4_t c) // VABAL.S16 q0,d0,d0 +{ + __m128i b32, c32, res; + b32 = _MM_CVTEPI16_EPI32(_pM128i(b)); //SSE4.1 + c32 = _MM_CVTEPI16_EPI32(_pM128i(c)); //SSE4.1 + res = _mm_abs_epi32 (_mm_sub_epi32 (b32, c32) ); + return _mm_add_epi32 (a, res); +} + +int64x2_t vabal_s32(int64x2_t a, int32x2_t b, int32x2_t c); // VABAL.S32 q0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING (int64x2_t vabal_s32(int64x2_t a, int32x2_t b, int32x2_t c), _NEON2SSE_REASON_SLOW_SERIAL) +{ + __m128i res; + res = vabdl_s32(b,c); + return _mm_add_epi64(a, res); +} + +uint16x8_t vabal_u8(uint16x8_t a, uint8x8_t b, uint8x8_t c); // VABAL.U8 q0,d0,d0 +_NEON2SSE_INLINE uint16x8_t vabal_u8(uint16x8_t a, uint8x8_t b, uint8x8_t c) +{ + __m128i b16, c16, res; + b16 = _MM_CVTEPU8_EPI16 (_pM128i(b)); //SSE4.1, + c16 = _MM_CVTEPU8_EPI16 (_pM128i(c)); //SSE4.1, + res = _mm_abs_epi16 (_mm_sub_epi16 (b16, c16) ); + return _mm_add_epi16 (a, res); +} + +uint32x4_t vabal_u16(uint32x4_t a, uint16x4_t b, uint16x4_t c); // VABAL.s16 q0,d0,d0 +_NEON2SSE_INLINE uint32x4_t vabal_u16(uint32x4_t a, uint16x4_t b, uint16x4_t c) +{ + __m128i b32, c32, res; + b32 = _MM_CVTEPU16_EPI32(_pM128i(b)); //SSE4.1 + c32 = _MM_CVTEPU16_EPI32(_pM128i(c)); //SSE4.1 + res = _mm_abs_epi32 (_mm_sub_epi32 (b32, c32) ); + return _mm_add_epi32 (a, res); +} + +uint64x2_t vabal_u32(uint64x2_t a, uint32x2_t b, uint32x2_t c); // VABAL.U32 q0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING (uint64x2_t vabal_u32(uint64x2_t a, uint32x2_t b, uint32x2_t c), _NEON2SSE_REASON_SLOW_SERIAL) +{ + __m128i res; + res = vabdl_u32(b,c); + return _mm_add_epi64(a, res); +} + +//*********************************************************************************** +//**************** Maximum and minimum operations ********************************** +//*********************************************************************************** +//************* Maximum: vmax -> Vr[i] := (Va[i] >= Vb[i]) ? Va[i] : Vb[i] ******* +//*********************************************************************************** +int8x8_t vmax_s8(int8x8_t a, int8x8_t b); // VMAX.S8 d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vmax_s8(int8x8_t a, int8x8_t b) +{ + int8x8_t res64; + __m128i res; + res = _MM_MAX_EPI8(_pM128i(a),_pM128i(b)); //SSE4.1, use only lower 64 bits + return64(res); +} + +int16x4_t vmax_s16(int16x4_t a, int16x4_t b); // VMAX.S16 d0,d0,d0 +_NEON2SSE_INLINE int16x4_t vmax_s16(int16x4_t a, int16x4_t b) +{ + int16x4_t res64; + return64(_mm_max_epi16(_pM128i(a),_pM128i(b))); +} + +int32x2_t vmax_s32(int32x2_t a, int32x2_t b); // VMAX.S32 d0,d0,d0 +_NEON2SSE_INLINE int32x2_t vmax_s32(int32x2_t a, int32x2_t b) +{ + int32x2_t res64; + __m128i res; + res = _MM_MAX_EPI32(_pM128i(a),_pM128i(b)); //SSE4.1, use only lower 64 bits + return64(res); +} + +uint8x8_t vmax_u8(uint8x8_t a, uint8x8_t b); // VMAX.U8 d0,d0,d0 +_NEON2SSE_INLINE uint8x8_t vmax_u8(uint8x8_t a, uint8x8_t b) +{ + uint8x8_t res64; + return64(_mm_max_epu8(_pM128i(a),_pM128i(b))); +} + + +uint16x4_t vmax_u16(uint16x4_t a, uint16x4_t b); // VMAX.s16 d0,d0,d0 +_NEON2SSE_INLINE uint16x4_t vmax_u16(uint16x4_t a, uint16x4_t b) +{ + uint16x4_t res64; + return64(_MM_MAX_EPU16(_pM128i(a),_pM128i(b))); +} + + +uint32x2_t vmax_u32(uint32x2_t a, uint32x2_t b); // VMAX.U32 d0,d0,d0 +_NEON2SSE_INLINE uint32x2_t vmax_u32(uint32x2_t a, uint32x2_t b) +{ + uint32x2_t res64; + __m128i res; + res = _MM_MAX_EPU32(_pM128i(a),_pM128i(b)); //SSE4.1, use only lower 64 bits, may be not effective compared with serial + return64(res); +} + +float32x2_t vmax_f32(float32x2_t a, float32x2_t b); // VMAX.F32 d0,d0,d0 +_NEON2SSE_INLINE float32x2_t vmax_f32(float32x2_t a, float32x2_t b) +{ + //serial solution looks faster than SIMD one + float32x2_t res; + res.m64_f32[0] = (a.m64_f32[0] > b.m64_f32[0]) ? a.m64_f32[0] : b.m64_f32[0]; + res.m64_f32[1] = (a.m64_f32[1] > b.m64_f32[1]) ? a.m64_f32[1] : b.m64_f32[1]; + return res; +} + +int8x16_t vmaxq_s8(int8x16_t a, int8x16_t b); // VMAX.S8 q0,q0,q0 +#define vmaxq_s8 _MM_MAX_EPI8 //SSE4.1 + +int16x8_t vmaxq_s16(int16x8_t a, int16x8_t b); // VMAX.S16 q0,q0,q0 +#define vmaxq_s16 _mm_max_epi16 + +int32x4_t vmaxq_s32(int32x4_t a, int32x4_t b); // VMAX.S32 q0,q0,q0 +#define vmaxq_s32 _MM_MAX_EPI32 //SSE4.1 + +uint8x16_t vmaxq_u8(uint8x16_t a, uint8x16_t b); // VMAX.U8 q0,q0,q0 +#define vmaxq_u8 _mm_max_epu8 + +uint16x8_t vmaxq_u16(uint16x8_t a, uint16x8_t b); // VMAX.s16 q0,q0,q0 +#define vmaxq_u16 _MM_MAX_EPU16 //SSE4.1 + +uint32x4_t vmaxq_u32(uint32x4_t a, uint32x4_t b); // VMAX.U32 q0,q0,q0 +#define vmaxq_u32 _MM_MAX_EPU32 //SSE4.1 + + +float32x4_t vmaxq_f32(float32x4_t a, float32x4_t b); // VMAX.F32 q0,q0,q0 +#define vmaxq_f32 _mm_max_ps + +//*************** Minimum: vmin -> Vr[i] := (Va[i] >= Vb[i]) ? Vb[i] : Va[i] ******************************** +//*********************************************************************************************************** +int8x8_t vmin_s8(int8x8_t a, int8x8_t b); // VMIN.S8 d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vmin_s8(int8x8_t a, int8x8_t b) +{ + int8x8_t res64; + __m128i res; + res = _MM_MIN_EPI8(_pM128i(a),_pM128i(b)); //SSE4.1, use only lower 64 bits + return64(res); +} + +int16x4_t vmin_s16(int16x4_t a, int16x4_t b); // VMIN.S16 d0,d0,d0 +_NEON2SSE_INLINE int16x4_t vmin_s16(int16x4_t a, int16x4_t b) +{ + int16x4_t res64; + return64(_mm_min_epi16(_pM128i(a),_pM128i(b))); +} + + +int32x2_t vmin_s32(int32x2_t a, int32x2_t b); // VMIN.S32 d0,d0,d0 +_NEON2SSE_INLINE int32x2_t vmin_s32(int32x2_t a, int32x2_t b) +{ + int32x2_t res64; + __m128i res; + res = _MM_MIN_EPI32(_pM128i(a),_pM128i(b)); //SSE4.1, use only lower 64 bits + return64(res); +} + +uint8x8_t vmin_u8(uint8x8_t a, uint8x8_t b); // VMIN.U8 d0,d0,d0 +_NEON2SSE_INLINE uint8x8_t vmin_u8(uint8x8_t a, uint8x8_t b) +{ + uint8x8_t res64; + return64(_mm_min_epu8(_pM128i(a),_pM128i(b))); +} + + +uint16x4_t vmin_u16(uint16x4_t a, uint16x4_t b); // VMIN.s16 d0,d0,d0 +_NEON2SSE_INLINE uint16x4_t vmin_u16(uint16x4_t a, uint16x4_t b) +{ + uint16x4_t res64; + return64(_MM_MIN_EPU16(_pM128i(a),_pM128i(b))); +} + + +uint32x2_t vmin_u32(uint32x2_t a, uint32x2_t b); // VMIN.U32 d0,d0,d0 +_NEON2SSE_INLINE uint32x2_t vmin_u32(uint32x2_t a, uint32x2_t b) +{ + uint32x2_t res64; + __m128i res; + res = _MM_MIN_EPU32(_pM128i(a),_pM128i(b)); //SSE4.1, use only lower 64 bits, may be not effective compared with serial + return64(res); +} + +float32x2_t vmin_f32(float32x2_t a, float32x2_t b); // VMIN.F32 d0,d0,d0 +_NEON2SSE_INLINE float32x2_t vmin_f32(float32x2_t a, float32x2_t b) +{ + //serial solution looks faster than SIMD one + float32x2_t res; + res.m64_f32[0] = (a.m64_f32[0] < b.m64_f32[0]) ? a.m64_f32[0] : b.m64_f32[0]; + res.m64_f32[1] = (a.m64_f32[1] < b.m64_f32[1]) ? a.m64_f32[1] : b.m64_f32[1]; + return res; +} + +int8x16_t vminq_s8(int8x16_t a, int8x16_t b); // VMIN.S8 q0,q0,q0 +#define vminq_s8 _MM_MIN_EPI8 //SSE4.1 + +int16x8_t vminq_s16(int16x8_t a, int16x8_t b); // VMIN.S16 q0,q0,q0 +#define vminq_s16 _mm_min_epi16 + +int32x4_t vminq_s32(int32x4_t a, int32x4_t b); // VMIN.S32 q0,q0,q0 +#define vminq_s32 _MM_MIN_EPI32 //SSE4.1 + +uint8x16_t vminq_u8(uint8x16_t a, uint8x16_t b); // VMIN.U8 q0,q0,q0 +#define vminq_u8 _mm_min_epu8 + +uint16x8_t vminq_u16(uint16x8_t a, uint16x8_t b); // VMIN.s16 q0,q0,q0 +#define vminq_u16 _MM_MIN_EPU16 //SSE4.1 + +uint32x4_t vminq_u32(uint32x4_t a, uint32x4_t b); // VMIN.U32 q0,q0,q0 +#define vminq_u32 _MM_MIN_EPU32 //SSE4.1 + +float32x4_t vminq_f32(float32x4_t a, float32x4_t b); // VMIN.F32 q0,q0,q0 +#define vminq_f32 _mm_min_ps + +//************* Pairwise addition operations. ************************************** +//************************************************************************************ +//Pairwise add - adds adjacent pairs of elements of two vectors, and places the results in the destination vector +int8x8_t vpadd_s8(int8x8_t a, int8x8_t b); // VPADD.I8 d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vpadd_s8(int8x8_t a, int8x8_t b) // VPADD.I8 d0,d0,d0 +{ + //no 8 bit hadd in IA32, need to go to 16 bit and then pack + int8x8_t res64; + __m128i a16, b16, res; + _NEON2SSE_ALIGN_16 int8_t mask8_16_even_odd[16] = { 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15 }; + a16 = _MM_CVTEPI8_EPI16 (_pM128i(a)); // SSE 4.1 + b16 = _MM_CVTEPI8_EPI16 (_pM128i(b)); // SSE 4.1 + res = _mm_hadd_epi16 (a16, b16); + res = _mm_shuffle_epi8 (res, *(__m128i*) mask8_16_even_odd); //return to 8 bit, use low 64 bits + return64(res); +} + +int16x4_t vpadd_s16(int16x4_t a, int16x4_t b); // VPADD.I16 d0,d0,d0 +_NEON2SSE_INLINE int16x4_t vpadd_s16(int16x4_t a, int16x4_t b) +{ + int16x4_t res64; + __m128i hadd128; + hadd128 = _mm_hadd_epi16 (_pM128i(a), _pM128i(b)); + hadd128 = _mm_shuffle_epi32 (hadd128, 0 | (2 << 2) | (1 << 4) | (3 << 6)); + return64(hadd128); +} + + +int32x2_t vpadd_s32(int32x2_t a, int32x2_t b); // VPADD.I32 d0,d0,d0 +_NEON2SSE_INLINE int32x2_t vpadd_s32(int32x2_t a, int32x2_t b) +{ + int32x2_t res64; + __m128i hadd128; + hadd128 = _mm_hadd_epi32 (_pM128i(a), _pM128i(b)); + hadd128 = _mm_shuffle_epi32 (hadd128, 0 | (2 << 2) | (1 << 4) | (3 << 6)); + return64(hadd128); +} + + +uint8x8_t vpadd_u8(uint8x8_t a, uint8x8_t b); // VPADD.I8 d0,d0,d0 +_NEON2SSE_INLINE uint8x8_t vpadd_u8(uint8x8_t a, uint8x8_t b) // VPADD.I8 d0,d0,d0 +{ + // no 8 bit hadd in IA32, need to go to 16 bit and then pack + uint8x8_t res64; +// no unsigned _mm_hadd_ functions in IA32, but 8 unsigned is less then 16 signed, so it should work + __m128i mask8, a16, b16, res; + mask8 = _mm_set1_epi16(0xff); + a16 = _MM_CVTEPU8_EPI16 (_pM128i(a)); // SSE 4.1 + b16 = _MM_CVTEPU8_EPI16 (_pM128i(b)); // SSE 4.1 + res = _mm_hadd_epi16 (a16, b16); + res = _mm_and_si128(res, mask8); //to avoid saturation + res = _mm_packus_epi16 (res,res); //use low 64 bits + return64(res); +} + +uint16x4_t vpadd_u16(uint16x4_t a, uint16x4_t b); // VPADD.I16 d0,d0,d0 +_NEON2SSE_INLINE uint16x4_t vpadd_u16(uint16x4_t a, uint16x4_t b) // VPADD.I16 d0,d0,d0 +{ + // solution may be not optimal, serial execution may be faster + // no unsigned _mm_hadd_ functions in IA32, need to move from unsigned to signed + uint16x4_t res64; + __m128i c32767, cfffe, as, bs, res; + c32767 = _mm_set1_epi16 (32767); + cfffe = _mm_set1_epi16 (0xfffe); + as = _mm_sub_epi16 (_pM128i(a), c32767); + bs = _mm_sub_epi16 (_pM128i(b), c32767); + res = _mm_hadd_epi16 (as, bs); + res = _mm_add_epi16 (res, cfffe); + res = _mm_shuffle_epi32 (res, 0 | (2 << 2) | (1 << 4) | (3 << 6)); + return64(res); +} + +uint32x2_t vpadd_u32(uint32x2_t a, uint32x2_t b); // VPADD.I32 d0,d0,d0 +_NEON2SSE_INLINE uint32x2_t vpadd_u32(uint32x2_t a, uint32x2_t b) //serial may be faster +{ + //hadd doesn't work for unsigned values + uint32x2_t res64; + __m128i ab, ab_sh, res; + ab = _mm_unpacklo_epi64 ( _pM128i(a), _pM128i(b)); //a0 a1 b0 b1 + ab_sh = _mm_shuffle_epi32(ab, 1 | (0 << 2) | (3 << 4) | (2 << 6)); //a1, a0, b1, b0 + res = _mm_add_epi32(ab, ab_sh); + res = _mm_shuffle_epi32(res, 0 | (2 << 2) | (1 << 4) | (3 << 6)); + return64(res); +} + +float32x2_t vpadd_f32(float32x2_t a, float32x2_t b); // VPADD.F32 d0,d0,d0 +_NEON2SSE_INLINE float32x2_t vpadd_f32(float32x2_t a, float32x2_t b) +{ + __m128 hadd128; + __m64_128 res64; + hadd128 = _mm_hadd_ps (_pM128(a), _pM128(b)); + hadd128 = _mm_shuffle_ps (hadd128, hadd128, _MM_SHUFFLE(3,1, 2, 0)); //use low 64 bits + _M64f(res64, hadd128); + return res64; +} + + +//************************** Long pairwise add ********************************** +//********************************************************************************* +//Adds adjacent pairs of elements of a vector,sign or zero extends the results to twice their original width, +// and places the final results in the destination vector. + +int16x4_t vpaddl_s8(int8x8_t a); // VPADDL.S8 d0,d0 +_NEON2SSE_INLINE int16x4_t vpaddl_s8(int8x8_t a) // VPADDL.S8 d0,d0 +{ + //no 8 bit hadd in IA32, need to go to 16 bit anyway + __m128i a16; + int16x4_t res64; + a16 = _MM_CVTEPI8_EPI16 (_pM128i(a)); // SSE 4.1 + a16 = _mm_hadd_epi16 (a16, a16); //use low 64 bits + return64(a16); +} + +int32x2_t vpaddl_s16(int16x4_t a); // VPADDL.S16 d0,d0 +_NEON2SSE_INLINE int32x2_t vpaddl_s16(int16x4_t a) // VPADDL.S16 d0,d0 +{ + // solution may be not optimal, serial execution may be faster + int32x2_t res64; + __m128i r32_1; + r32_1 = _MM_CVTEPI16_EPI32 (_pM128i(a)); + r32_1 = _mm_hadd_epi32(r32_1, r32_1); //use low 64 bits + return64(r32_1); +} + +int64x1_t vpaddl_s32(int32x2_t a); // VPADDL.S32 d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x1_t vpaddl_s32(int32x2_t a), _NEON2SSE_REASON_SLOW_SERIAL) //serial solution looks faster +{ + int64x1_t res; + res.m64_i64[0] = (int64_t)a.m64_i32[0] + (int64_t)a.m64_i32[1]; + return res; +} + +uint16x4_t vpaddl_u8(uint8x8_t a); // VPADDL.U8 d0,d0 +_NEON2SSE_INLINE uint16x4_t vpaddl_u8(uint8x8_t a) // VPADDL.U8 d0,d0 +{ + // no 8 bit hadd in IA32, need to go to 16 bit +// no unsigned _mm_hadd_ functions in IA32, but 8 unsigned is less then 16 signed, so it should work + uint16x4_t res64; + __m128i a16; + a16 = _MM_CVTEPU8_EPI16 (_pM128i(a)); // SSE 4.1 use low 64 bits + a16 = _mm_hadd_epi16 (a16, a16); //use low 64 bits + return64(a16); +} + +uint32x2_t vpaddl_u16(uint16x4_t a); // VPADDL.s16 d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x2_t vpaddl_u16(uint16x4_t a), _NEON2SSE_REASON_SLOW_SERIAL) +{ + //serial solution looks faster than a SIMD one + uint32x2_t res; + res.m64_u32[0] = (uint32_t)a.m64_u16[0] + (uint32_t)a.m64_u16[1]; + res.m64_u32[1] = (uint32_t)a.m64_u16[2] + (uint32_t)a.m64_u16[3]; + return res; +} + +uint64x1_t vpaddl_u32(uint32x2_t a); // VPADDL.U32 d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint64x1_t vpaddl_u32(uint32x2_t a), _NEON2SSE_REASON_SLOW_SERIAL) //serial solution looks faster +{ + uint64x1_t res; + res.m64_u64[0] = (uint64_t)a.m64_u32[0] + (uint64_t)a.m64_u32[1]; + return res; +} + +int16x8_t vpaddlq_s8(int8x16_t a); // VPADDL.S8 q0,q0 +_NEON2SSE_INLINE int16x8_t vpaddlq_s8(int8x16_t a) // VPADDL.S8 q0,q0 +{ + //no 8 bit hadd in IA32, need to go to 16 bit + __m128i r16_1, r16_2; + r16_1 = _MM_CVTEPI8_EPI16 (a); // SSE 4.1 + //swap hi and low part of r to process the remaining data + r16_2 = _mm_shuffle_epi32 (a, _SWAP_HI_LOW32); + r16_2 = _MM_CVTEPI8_EPI16 (r16_2); + return _mm_hadd_epi16 (r16_1, r16_2); +} + +int32x4_t vpaddlq_s16(int16x8_t a); // VPADDL.S16 q0,q0 +_NEON2SSE_INLINE int32x4_t vpaddlq_s16(int16x8_t a) // VPADDL.S16 q0,q0 +{ + //no 8 bit hadd in IA32, need to go to 16 bit + __m128i r32_1, r32_2; + r32_1 = _MM_CVTEPI16_EPI32(a); + //swap hi and low part of r to process the remaining data + r32_2 = _mm_shuffle_epi32 (a, _SWAP_HI_LOW32); + r32_2 = _MM_CVTEPI16_EPI32 (r32_2); + return _mm_hadd_epi32 (r32_1, r32_2); +} + +int64x2_t vpaddlq_s32(int32x4_t a); // VPADDL.S32 q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x2_t vpaddlq_s32(int32x4_t a), _NEON2SSE_REASON_SLOW_SERIAL) // VPADDL.S32 q0,q0 +{ + _NEON2SSE_ALIGN_16 int32_t atmp[4]; + _NEON2SSE_ALIGN_16 int64_t res[2]; + _mm_store_si128((__m128i*)atmp, a); + res[0] = (int64_t)atmp[0] + (int64_t)atmp[1]; + res[1] = (int64_t)atmp[2] + (int64_t)atmp[3]; + return _mm_load_si128((__m128i*)res); +} + +uint16x8_t vpaddlq_u8(uint8x16_t a); // VPADDL.U8 q0,q0 +_NEON2SSE_INLINE uint16x8_t vpaddlq_u8(uint8x16_t a) // VPADDL.U8 q0,q0 +{ + //no 8 bit hadd in IA32, need to go to 16 bit + __m128i r16_1, r16_2; + r16_1 = _MM_CVTEPU8_EPI16(a); + //swap hi and low part of r to process the remaining data + r16_2 = _mm_shuffle_epi32 (a, _SWAP_HI_LOW32); + r16_2 = _MM_CVTEPU8_EPI16 (r16_2); + return _mm_hadd_epi16 (r16_1, r16_2); +} + +uint32x4_t vpaddlq_u16(uint16x8_t a); // VPADDL.s16 q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x4_t vpaddlq_u16(uint16x8_t a), _NEON2SSE_REASON_SLOW_SERIAL) +{ + //serial solution looks faster than a SIMD one + _NEON2SSE_ALIGN_16 uint16_t atmp[8]; + _NEON2SSE_ALIGN_16 uint32_t res[4]; + _mm_store_si128((__m128i*)atmp, a); + res[0] = (uint32_t)atmp[0] + (uint32_t)atmp[1]; + res[1] = (uint32_t)atmp[2] + (uint32_t)atmp[3]; + res[2] = (uint32_t)atmp[4] + (uint32_t)atmp[5]; + res[3] = (uint32_t)atmp[6] + (uint32_t)atmp[7]; + return _mm_load_si128((__m128i*)res); +} + +uint64x2_t vpaddlq_u32(uint32x4_t a); // VPADDL.U32 q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint64x2_t vpaddlq_u32(uint32x4_t a), _NEON2SSE_REASON_SLOW_SERIAL) +{ + _NEON2SSE_ALIGN_16 uint32_t atmp[4]; + _NEON2SSE_ALIGN_16 uint64_t res[2]; + _mm_store_si128((__m128i*)atmp, a); + res[0] = (uint64_t)atmp[0] + (uint64_t)atmp[1]; + res[1] = (uint64_t)atmp[2] + (uint64_t)atmp[3]; + return _mm_load_si128((__m128i*)res); +} + +//************************ Long pairwise add and accumulate ************************** +//**************************************************************************************** +//VPADAL (Vector Pairwise Add and Accumulate Long) adds adjacent pairs of elements of a vector, +// and accumulates the values of the results into the elements of the destination (wide) vector +int16x4_t vpadal_s8(int16x4_t a, int8x8_t b); // VPADAL.S8 d0,d0 +_NEON2SSE_INLINE int16x4_t vpadal_s8(int16x4_t a, int8x8_t b) +{ + int16x4_t res64; + return64(vpadalq_s8(_pM128i(a), _pM128i(b))); +} + +int32x2_t vpadal_s16(int32x2_t a, int16x4_t b); // VPADAL.S16 d0,d0 +_NEON2SSE_INLINE int32x2_t vpadal_s16(int32x2_t a, int16x4_t b) +{ + int32x2_t res64; + return64(vpadalq_s16(_pM128i(a), _pM128i(b))); +} + + +int64x1_t vpadal_s32(int64x1_t a, int32x2_t b); // VPADAL.S32 d0,d0 +_NEON2SSE_INLINE int64x1_t vpadal_s32(int64x1_t a, int32x2_t b) +{ + int64x1_t res; + res.m64_i64[0] = (int64_t)b.m64_i32[0] + (int64_t)b.m64_i32[1] + a.m64_i64[0]; + return res; +} + +uint16x4_t vpadal_u8(uint16x4_t a, uint8x8_t b); // VPADAL.U8 d0,d0 +_NEON2SSE_INLINE uint16x4_t vpadal_u8(uint16x4_t a, uint8x8_t b) +{ + uint16x4_t res64; + return64(vpadalq_u8(_pM128i(a), _pM128i(b))); +} + + +uint32x2_t vpadal_u16(uint32x2_t a, uint16x4_t b); // VPADAL.s16 d0,d0 +_NEON2SSE_INLINE uint32x2_t vpadal_u16(uint32x2_t a, uint16x4_t b) +{ + uint32x2_t res64; + return64(vpadalq_u16(_pM128i(a), _pM128i(b))); +} + +uint64x1_t vpadal_u32(uint64x1_t a, uint32x2_t b); // VPADAL.U32 d0,d0 +_NEON2SSE_INLINE uint64x1_t vpadal_u32(uint64x1_t a, uint32x2_t b) +{ + uint64x1_t res; + res.m64_u64[0] = (uint64_t)b.m64_u32[0] + (uint64_t)b.m64_u32[1] + a.m64_u64[0]; + return res; +} + +int16x8_t vpadalq_s8(int16x8_t a, int8x16_t b); // VPADAL.S8 q0,q0 +_NEON2SSE_INLINE int16x8_t vpadalq_s8(int16x8_t a, int8x16_t b) // VPADAL.S8 q0,q0 +{ + int16x8_t pad; + pad = vpaddlq_s8(b); + return _mm_add_epi16 (a, pad); +} + +int32x4_t vpadalq_s16(int32x4_t a, int16x8_t b); // VPADAL.S16 q0,q0 +_NEON2SSE_INLINE int32x4_t vpadalq_s16(int32x4_t a, int16x8_t b) // VPADAL.S16 q0,q0 +{ + int32x4_t pad; + pad = vpaddlq_s16(b); + return _mm_add_epi32(a, pad); +} + +int64x2_t vpadalq_s32(int64x2_t a, int32x4_t b); // VPADAL.S32 q0,q0 +_NEON2SSE_INLINE int64x2_t vpadalq_s32(int64x2_t a, int32x4_t b) +{ + int64x2_t pad; + pad = vpaddlq_s32(b); + return _mm_add_epi64 (a, pad); +} + +uint16x8_t vpadalq_u8(uint16x8_t a, uint8x16_t b); // VPADAL.U8 q0,q0 +_NEON2SSE_INLINE uint16x8_t vpadalq_u8(uint16x8_t a, uint8x16_t b) // VPADAL.U8 q0,q0 +{ + uint16x8_t pad; + pad = vpaddlq_u8(b); + return _mm_add_epi16 (a, pad); +} + +uint32x4_t vpadalq_u16(uint32x4_t a, uint16x8_t b); // VPADAL.s16 q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x4_t vpadalq_u16(uint32x4_t a, uint16x8_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + uint32x4_t pad; + pad = vpaddlq_u16(b); + return _mm_add_epi32(a, pad); +} //no optimal SIMD solution, serial is faster + +uint64x2_t vpadalq_u32(uint64x2_t a, uint32x4_t b); // VPADAL.U32 q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint64x2_t vpadalq_u32(uint64x2_t a, uint32x4_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + //no optimal SIMD solution, serial is faster + uint64x2_t pad; + pad = vpaddlq_u32(b); + return _mm_add_epi64(a, pad); +} //no optimal SIMD solution, serial is faster + +//********** Folding maximum ************************************* +//******************************************************************* +//VPMAX (Vector Pairwise Maximum) compares adjacent pairs of elements in two vectors, +//and copies the larger of each pair into the corresponding element in the destination +// no corresponding functionality in IA32 SIMD, so we need to do the vertical comparison +int8x8_t vpmax_s8(int8x8_t a, int8x8_t b); // VPMAX.S8 d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vpmax_s8(int8x8_t a, int8x8_t b) // VPMAX.S8 d0,d0,d0 +{ + int8x8_t res64; + __m128i ab, ab1, max; + _NEON2SSE_ALIGN_16 uint8_t mask8_sab[16] = { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14}; + _NEON2SSE_ALIGN_16 uint8_t mask8_odd[16] = { 1, 3, 5, 7, 9, 11, 13, 15, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; + ab = _mm_unpacklo_epi64 ( _pM128i(a), _pM128i(b)); //ab + ab1 = _mm_shuffle_epi8 (ab, *(__m128i*) mask8_sab); //horisontal pairs swap for vertical max finding + max = _MM_MAX_EPI8 (ab, ab1); // SSE4.1 + max = _mm_shuffle_epi8 (max, *(__m128i*) mask8_odd); //remove repetitive data + return64(max); //we need 64 bits only +} + +int16x4_t vpmax_s16(int16x4_t a, int16x4_t b); // VPMAX.S16 d0,d0,d0 +_NEON2SSE_INLINE int16x4_t vpmax_s16(int16x4_t a, int16x4_t b) // VPMAX.S16 d0,d0,d0 +{ + //solution may be not optimal compared with the serial one + int16x4_t res64; + __m128i ab, ab1, max; + _NEON2SSE_ALIGN_16 int8_t mask16_sab[16] = { 2, 3, 0, 1, 6, 7, 4, 5, 10, 11, 8, 9, 14, 15, 12, 13}; //each chars pair is considerd to be 16 bit number + _NEON2SSE_ALIGN_16 int8_t mask16_odd[16] = { 0,1, 4,5, 8,9, 12,13, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; + ab = _mm_unpacklo_epi64 ( _pM128i(a), _pM128i(b)); //ab + ab1 = _mm_shuffle_epi8 (ab, *(__m128i*) mask16_sab); //horisontal pairs swap for vertical max finding, use 8bit fn and the corresponding mask + max = _mm_max_epi16 (ab, ab1); + max = _mm_shuffle_epi8 (max, *(__m128i*) mask16_odd); //remove repetitive data, use 8bit fn and the corresponding mask + return64(max); +} + +int32x2_t vpmax_s32(int32x2_t a, int32x2_t b); // VPMAX.S32 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x2_t vpmax_s32(int32x2_t a, int32x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + //serial solution looks faster than SIMD one + int32x2_t res; + res.m64_i32[0] = (a.m64_i32[0] < a.m64_i32[1]) ? a.m64_i32[1] : a.m64_i32[0]; + res.m64_i32[1] = (b.m64_i32[0] < b.m64_i32[1]) ? b.m64_i32[1] : b.m64_i32[0]; + return res; +} + +uint8x8_t vpmax_u8(uint8x8_t a, uint8x8_t b); // VPMAX.U8 d0,d0,d0 +_NEON2SSE_INLINE uint8x8_t vpmax_u8(uint8x8_t a, uint8x8_t b) // VPMAX.U8 d0,d0,d0 +{ + uint8x8_t res64; + __m128i ab, ab1, max; + _NEON2SSE_ALIGN_16 int8_t mask8_sab[16] = { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14}; + _NEON2SSE_ALIGN_16 uint8_t mask8_odd[16] = { 1, 3, 5, 7, 9, 11, 13, 15, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; + ab = _mm_unpacklo_epi64 (_pM128i(a), _pM128i(b)); //ab + ab1 = _mm_shuffle_epi8 (ab, *(__m128i*) mask8_sab); //horisontal pairs swap for vertical max finding + max = _mm_max_epu8 (ab, ab1); // SSE4.1 + max = _mm_shuffle_epi8 (max, *(__m128i*) mask8_odd); //remove repetitive data + return64(max); +} + +uint16x4_t vpmax_u16(uint16x4_t a, uint16x4_t b); // VPMAX.s16 d0,d0,d0 +_NEON2SSE_INLINE uint16x4_t vpmax_u16(uint16x4_t a, uint16x4_t b) // VPMAX.s16 d0,d0,d0 +{ + //solution may be not optimal compared with the serial one + uint16x4_t res64; + __m128i ab, ab1, max; + _NEON2SSE_ALIGN_16 uint8_t mask16_sab[16] = { 2, 3, 0, 1, 6, 7, 4, 5, 10, 11, 8, 9, 14, 15, 12, 13}; //each chars pair is considerd to be 16 bit number + _NEON2SSE_ALIGN_16 uint8_t mask16_odd[16] = { 0,1, 4,5, 8,9, 12,13, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; + ab = _mm_unpacklo_epi64 ( _pM128i(a), _pM128i(b)); //ab + ab1 = _mm_shuffle_epi8 (ab, *(__m128i*) mask16_sab); //horisontal pairs swap for vertical max finding, use 8bit fn and the corresponding mask + max = _MM_MAX_EPU16 (ab, ab1); + max = _mm_shuffle_epi8 (max, *(__m128i*) mask16_odd); //remove repetitive data, use 8bit fn and the corresponding mask + return64(max); +} + +uint32x2_t vpmax_u32(uint32x2_t a, uint32x2_t b); // VPMAX.U32 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x2_t vpmax_u32(uint32x2_t a, uint32x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + //serial solution looks faster than SIMD one + uint32x2_t res; + res.m64_i32[0] = (a.m64_i32[0] < a.m64_i32[1]) ? a.m64_i32[1] : a.m64_i32[0]; + res.m64_i32[1] = (b.m64_i32[0] < b.m64_i32[1]) ? b.m64_i32[1] : b.m64_i32[0]; + return res; +} //serial solution looks faster than a SIMD one + +float32x2_t vpmax_f32(float32x2_t a, float32x2_t b); // VPMAX.F32 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(float32x2_t vpmax_f32(float32x2_t a, float32x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + //serial solution looks faster than SIMD one + float32x2_t res; + res.m64_f32[0] = (a.m64_f32[0] < a.m64_f32[1]) ? a.m64_f32[1] : a.m64_f32[0]; + res.m64_f32[1] = (b.m64_f32[0] < b.m64_f32[1]) ? b.m64_f32[1] : b.m64_f32[0]; + return res; +} + +// ***************** Folding minimum **************************** +// ************************************************************** +//vpmin -> takes minimum of adjacent pairs +int8x8_t vpmin_s8(int8x8_t a, int8x8_t b); // VPMIN.S8 d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vpmin_s8(int8x8_t a, int8x8_t b) // VPMIN.S8 d0,d0,d0 +{ + int8x8_t res64; + __m128i ab, ab1, min; + _NEON2SSE_ALIGN_16 uint8_t mask8_sab[16] = { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14}; + _NEON2SSE_ALIGN_16 uint8_t mask8_odd[16] = { 1, 3, 5, 7, 9, 11, 13, 15, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; + ab = _mm_unpacklo_epi64 ( _pM128i(a), _pM128i(b)); //ab + ab1 = _mm_shuffle_epi8 (ab, *(__m128i*) mask8_sab); //horisontal pairs swap for vertical min finding + min = _MM_MIN_EPI8 (ab, ab1); // SSE4.1 + min = _mm_shuffle_epi8 (min, *(__m128i*) mask8_odd); //remove repetitive data + return64(min); +} + +int16x4_t vpmin_s16(int16x4_t a, int16x4_t b); // VPMIN.S16 d0,d0,d0 +_NEON2SSE_INLINE int16x4_t vpmin_s16(int16x4_t a, int16x4_t b) // VPMIN.S16 d0,d0,d0 +{ + //solution may be not optimal compared with the serial one + int16x4_t res64; + __m128i ab, ab1, min; + _NEON2SSE_ALIGN_16 int8_t mask16_sab[16] = { 2, 3, 0, 1, 6, 7, 4, 5, 10, 11, 8, 9, 14, 15, 12, 13}; //each chars pair is considerd to be 16 bit number + _NEON2SSE_ALIGN_16 int8_t mask16_odd[16] = { 0,1, 4,5, 8,9, 12,13, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; + ab = _mm_unpacklo_epi64 ( _pM128i(a), _pM128i(b)); //ab + ab1 = _mm_shuffle_epi8 (ab, *(__m128i*) mask16_sab); //horisontal pairs swap for vertical max finding, use 8bit fn and the corresponding mask + min = _mm_min_epi16 (ab, ab1); + min = _mm_shuffle_epi8 (min, *(__m128i*) mask16_odd); //remove repetitive data, use 8bit fn and the corresponding mask + return64(min); +} + +int32x2_t vpmin_s32(int32x2_t a, int32x2_t b); // VPMIN.S32 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x2_t vpmin_s32(int32x2_t a, int32x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + //serial solution looks faster than SIMD one + int32x2_t res; + res.m64_i32[0] = (a.m64_i32[0] > a.m64_i32[1]) ? a.m64_i32[1] : a.m64_i32[0]; + res.m64_i32[1] = (b.m64_i32[0] > b.m64_i32[1]) ? b.m64_i32[1] : b.m64_i32[0]; + return res; +} + +uint8x8_t vpmin_u8(uint8x8_t a, uint8x8_t b); // VPMIN.U8 d0,d0,d0 +_NEON2SSE_INLINE uint8x8_t vpmin_u8(uint8x8_t a, uint8x8_t b) // VPMIN.U8 d0,d0,d0 +{ + uint8x8_t res64; + __m128i ab, ab1, min; + _NEON2SSE_ALIGN_16 uint8_t mask8_sab[16] = { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14}; + _NEON2SSE_ALIGN_16 uint8_t mask8_odd[16] = { 1, 3, 5, 7, 9, 11, 13, 15, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; + ab = _mm_unpacklo_epi64 ( _pM128i(a), _pM128i(b)); //ab + ab1 = _mm_shuffle_epi8 (ab, *(__m128i*) mask8_sab); //horisontal pairs swap for vertical max finding + min = _mm_min_epu8 (ab, ab1); // SSE4.1 + min = _mm_shuffle_epi8 (min, *(__m128i*) mask8_odd); //remove repetitive data + return64(min); +} + +uint16x4_t vpmin_u16(uint16x4_t a, uint16x4_t b); // VPMIN.s16 d0,d0,d0 +_NEON2SSE_INLINE uint16x4_t vpmin_u16(uint16x4_t a, uint16x4_t b) // VPMIN.s16 d0,d0,d0 +{ + //solution may be not optimal compared with the serial one + uint16x4_t res64; + __m128i ab, ab1, min; + _NEON2SSE_ALIGN_16 uint8_t mask16_sab[16] = { 2, 3, 0, 1, 6, 7, 4, 5, 10, 11, 8, 9, 14, 15, 12, 13}; //each chars pair is considerd to be 16 bit number + _NEON2SSE_ALIGN_16 uint8_t mask16_odd[16] = { 0,1, 4,5, 8,9, 12,13, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff}; + ab = _mm_unpacklo_epi64 ( _pM128i(a), _pM128i(b)); //ab + ab1 = _mm_shuffle_epi8 (ab, *(__m128i*) mask16_sab); //horisontal pairs swap for vertical min finding, use 8bit fn and the corresponding mask + min = _MM_MIN_EPU16 (ab, ab1); + min = _mm_shuffle_epi8 (min, *(__m128i*) mask16_odd); //remove repetitive data, use 8bit fn and the corresponding mask + return64(min); +} + +uint32x2_t vpmin_u32(uint32x2_t a, uint32x2_t b); // VPMIN.U32 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x2_t vpmin_u32(uint32x2_t a, uint32x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + //serial solution looks faster than SIMD one + uint32x2_t res; + res.m64_u32[0] = (a.m64_u32[0] > a.m64_u32[1]) ? a.m64_u32[1] : a.m64_u32[0]; + res.m64_u32[1] = (b.m64_u32[0] > b.m64_u32[1]) ? b.m64_u32[1] : b.m64_u32[0]; + return res; +} + +float32x2_t vpmin_f32(float32x2_t a, float32x2_t b); // VPMIN.F32 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(float32x2_t vpmin_f32(float32x2_t a, float32x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + //serial solution looks faster than SIMD one + float32x2_t res; + res.m64_f32[0] = (a.m64_f32[0] > a.m64_f32[1]) ? a.m64_f32[1] : a.m64_f32[0]; + res.m64_f32[1] = (b.m64_f32[0] > b.m64_f32[1]) ? b.m64_f32[1] : b.m64_f32[0]; + return res; +} + +//*************************************************************** +//*********** Reciprocal/Sqrt ************************************ +//*************************************************************** +//****************** Reciprocal estimate ******************************* +//the ARM NEON and x86 SIMD results may be slightly different +float32x2_t vrecpe_f32(float32x2_t a); // VRECPE.F32 d0,d0 +_NEON2SSE_INLINE float32x2_t vrecpe_f32(float32x2_t a) //use low 64 bits +{ + float32x4_t res; + __m64_128 res64; + res = _mm_rcp_ps(_pM128(a)); + _M64f(res64, res); + return res64; +} + +uint32x2_t vrecpe_u32(uint32x2_t a); // VRECPE.U32 d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x2_t vrecpe_u32(uint32x2_t a), _NEON2SSE_REASON_SLOW_SERIAL) +{ + //Input is fixed point number!!! No reciprocal for ints in IA32 available + uint32x2_t res; + float resf, r; + int i, q, s; + for (i =0; i<2; i++){ + if((a.m64_u32[i] & 0x80000000) == 0) { + res.m64_u32[i] = 0xffffffff; + }else{ + resf = (float) (a.m64_u32[i] * (0.5f / (uint32_t)(1 << 31))); + q = (int)(resf * 512.0); /* a in units of 1/512 rounded down */ + r = 1.0 / (((float)q + 0.5) / 512.0); /* reciprocal r */ + s = (int)(256.0 * r + 0.5); /* r in units of 1/256 rounded to nearest */ + r = (float)s / 256.0; + res.m64_u32[i] = r * (uint32_t)(1 << 31); + } + } + return res; +} + +float32x4_t vrecpeq_f32(float32x4_t a); // VRECPE.F32 q0,q0 +#define vrecpeq_f32 _mm_rcp_ps + + +uint32x4_t vrecpeq_u32(uint32x4_t a); // VRECPE.U32 q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x4_t vrecpeq_u32(uint32x4_t a), _NEON2SSE_REASON_SLOW_SERIAL) +{ + //Input is fixed point number!!! + //We implement the recip_estimate function as described in ARMv7 reference manual (VRECPE instruction) but use float instead of double + _NEON2SSE_ALIGN_16 uint32_t atmp[4]; + _NEON2SSE_ALIGN_16 uint32_t res[4]; + _NEON2SSE_ALIGN_16 int c80000000[4] = {0x80000000,0x80000000, 0x80000000,0x80000000}; + float resf, r; + int i, q, s; + __m128i res128, mask, zero; + _mm_store_si128((__m128i*)atmp, a); + zero = _mm_setzero_si128(); + for (i =0; i<4; i++){ + resf = (atmp[i] * (0.5f / (uint32_t) (1 << 31))); // 2.3283064365386963E-10 ~(0.5f / (uint32_t) (1 << 31)) + q = (int)(resf * 512.0); /* a in units of 1/512 rounded down */ + r = 1.0 / (((float)q + 0.5) / 512.0); /* reciprocal r */ + s = (int)(256.0 * r + 0.5); /* r in units of 1/256 rounded to nearest */ + r = (float)s / 256.0; + res[i] = (uint32_t) (r * (((uint32_t)1) << 31) ); + } + res128 = _mm_load_si128((__m128i*)res); + mask = _mm_and_si128(a, *(__m128i*)c80000000); + mask = _mm_cmpeq_epi32(zero, mask); //0xffffffff if atmp[i] <= 0x7fffffff + return _mm_or_si128(res128, mask); +} + +//**********Reciprocal square root estimate **************** +//********************************************************** +//no reciprocal square root for ints in IA32 available, neither for unsigned int to float4 lanes conversion, so a serial solution looks faster +//but the particular implementation for vrsqrte_u32 may vary for various ARM compilers +////the ARM NEON and x86 SIMD results may be slightly different +float32x2_t vrsqrte_f32(float32x2_t a); // VRSQRTE.F32 d0,d0 +_NEON2SSE_INLINE float32x2_t vrsqrte_f32(float32x2_t a) //use low 64 bits +{ + float32x4_t res; + __m64_128 res64; + res = _mm_rsqrt_ps(_pM128(a)); + _M64f(res64, res); + return res64; +} + +uint32x2_t vrsqrte_u32(uint32x2_t a); // VRSQRTE.U32 d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x2_t vrsqrte_u32(uint32x2_t a), _NEON2SSE_REASON_SLOW_SERIAL) +{ + //Input is fixed point number!!! + //We implement the recip_sqrt_estimate function as described in ARMv7 reference manual (VRSQRTE instruction) but use float instead of double + uint32x2_t res; + __m128 tmp; + float r, resf, coeff; + int i,q0, q1, s;; + for (i =0; i<2; i++){ + if((a.m64_u32[i] & 0xc0000000) == 0) { //a <=0x3fffffff + res.m64_u32[i] = 0xffffffff; + }else{ + resf = (float) (a.m64_u32[i] * (0.5f / (uint32_t)(1 << 31))); + coeff = (resf < 0.5)? 512.0 : 256.0 ; /* range 0.25 <= resf < 0.5 or range 0.5 <= resf < 1.0*/ + q0 = (int)(resf * coeff); /* a in units of 1/512 rounded down */ + r = ((float)q0 + 0.5) / coeff; + tmp = _mm_rsqrt_ss(_mm_load_ss( &r));/* reciprocal root r */ + _mm_store_ss(&r, tmp); + s = (int)(256.0 * r + 0.5); /* r in units of 1/256 rounded to nearest */ + r = (float)s / 256.0; + res.m64_u32[i] = r * (((uint32_t)1) << 31); + } + } + return res; +} + +float32x4_t vrsqrteq_f32(float32x4_t a); // VRSQRTE.F32 q0,q0 +#define vrsqrteq_f32 _mm_rsqrt_ps + +uint32x4_t vrsqrteq_u32(uint32x4_t a); // VRSQRTE.U32 q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x4_t vrsqrteq_u32(uint32x4_t a), _NEON2SSE_REASON_SLOW_SERIAL) +{ + //Input is fixed point number!!! + //We implement the recip_sqrt_estimate function as described in ARMv7 reference manual (VRSQRTE instruction) but use float instead of double + _NEON2SSE_ALIGN_16 uint32_t atmp[4], res[4]; + _NEON2SSE_ALIGN_16 float c1_31[4] = {(float)(((uint32_t)1) << 31), (float)(((uint32_t)1) << 31),(float)(((uint32_t)1) << 31), (float)(((uint32_t)1) << 31)}; + _NEON2SSE_ALIGN_16 int c_c0000000[4] = {0xc0000000,0xc0000000, 0xc0000000,0xc0000000}; + __m128 tmp; + __m128i res128, mask, zero; + float r, resf, coeff; + int i,q0, q1, s; + _mm_store_si128((__m128i*)atmp, a); + zero = _mm_setzero_si128(); + for (i =0; i<4; i++){ + resf = (float) (atmp[i] * (0.5f / (uint32_t)(1 << 31))); + coeff = (resf < 0.5)? 512.0 : 256.0 ; /* range 0.25 <= resf < 0.5 or range 0.5 <= resf < 1.0*/ + q0 = (int)(resf * coeff); /* a in units of 1/512 rounded down */ + r = ((float)q0 + 0.5) / coeff; + tmp = _mm_rsqrt_ss(_mm_load_ss( &r));/* reciprocal root r */ + _mm_store_ss(&r, tmp); + s = (int)(256.0 * r + 0.5); /* r in units of 1/256 rounded to nearest */ + r = (float)s / 256.0; + res[i] = (uint32_t) (r * (((uint32_t)1) << 31) ); + } + res128 = _mm_load_si128((__m128i*)res); + mask = _mm_and_si128(a, *(__m128i*)c_c0000000); + mask = _mm_cmpeq_epi32(zero, mask); //0xffffffff if atmp[i] <= 0x3fffffff + return _mm_or_si128(res128, mask); +} +//************ Reciprocal estimate/step and 1/sqrt estimate/step *************************** +//****************************************************************************************** +//******VRECPS (Vector Reciprocal Step) *************************************************** +//multiplies the elements of one vector by the corresponding elements of another vector, +//subtracts each of the results from 2, and places the final results into the elements of the destination vector. + +float32x2_t vrecps_f32(float32x2_t a, float32x2_t b); // VRECPS.F32 d0, d0, d0 +_NEON2SSE_INLINE float32x2_t vrecps_f32(float32x2_t a, float32x2_t b) +{ + float32x4_t res; + __m64_128 res64; + res = vrecpsq_f32(_pM128(a), _pM128(b)); + _M64f(res64, res); + return res64; +} + +float32x4_t vrecpsq_f32(float32x4_t a, float32x4_t b); // VRECPS.F32 q0, q0, q0 +_NEON2SSE_INLINE float32x4_t vrecpsq_f32(float32x4_t a, float32x4_t b) // VRECPS.F32 q0, q0, q0 +{ + __m128 f2, mul; + f2 = _mm_set1_ps(2.); + mul = _mm_mul_ps(a,b); + return _mm_sub_ps(f2,mul); +} + +//*****************VRSQRTS (Vector Reciprocal Square Root Step) ***************************** +//multiplies the elements of one vector by the corresponding elements of another vector, +//subtracts each of the results from 3, divides these results by two, and places the final results into the elements of the destination vector. + +float32x2_t vrsqrts_f32(float32x2_t a, float32x2_t b); // VRSQRTS.F32 d0, d0, d0 +_NEON2SSE_INLINE float32x2_t vrsqrts_f32(float32x2_t a, float32x2_t b) +{ + float32x2_t res; + res.m64_f32[0] = (3 - a.m64_f32[0] * b.m64_f32[0]) / 2; + res.m64_f32[1] = (3 - a.m64_f32[1] * b.m64_f32[1]) / 2; + return res; +} + +float32x4_t vrsqrtsq_f32(float32x4_t a, float32x4_t b); // VRSQRTS.F32 q0, q0, q0 +_NEON2SSE_INLINE float32x4_t vrsqrtsq_f32(float32x4_t a, float32x4_t b) // VRSQRTS.F32 q0, q0, q0 +{ + __m128 f3, f05, mul; + f3 = _mm_set1_ps(3.); + f05 = _mm_set1_ps(0.5); + mul = _mm_mul_ps(a,b); + f3 = _mm_sub_ps(f3,mul); + return _mm_mul_ps (f3, f05); +} +//******************************************************************************************** +//***************************** Shifts by signed variable *********************************** +//******************************************************************************************** +//***** Vector shift left: Vr[i] := Va[i] << Vb[i] (negative values shift right) *********************** +//******************************************************************************************** +//No such operations in IA32 SIMD unfortunately, constant shift only available, so need to do the serial solution +//helper macro. It matches ARM implementation for big shifts +#define SERIAL_SHIFT(TYPE, INTERNAL_TYPE, LENMAX, LEN) \ + _NEON2SSE_ALIGN_16 TYPE atmp[LENMAX], res[LENMAX]; _NEON2SSE_ALIGN_16 INTERNAL_TYPE btmp[LENMAX]; int i, lanesize = sizeof(INTERNAL_TYPE) << 3; \ + _mm_store_si128((__m128i*)atmp, a); _mm_store_si128((__m128i*)btmp, b); \ + for (i = 0; i= lanesize)||(btmp[i] <= -lanesize) ) res[i] = 0; \ + else res[i] = (btmp[i] >=0) ? atmp[i] << btmp[i] : atmp[i] >> (-btmp[i]); } \ + return _mm_load_si128((__m128i*)res); + +#define SERIAL_SHIFT_64(TYPE, SIGN, LEN) \ + int ## TYPE ## x ## LEN ## _t res; int i, lanesize = sizeof(int ## TYPE ## _t) << 3; \ + for (i = 0; i= lanesize)||(b.m64_i ## TYPE[i] <= -lanesize) ) res.m64_ ## SIGN ## TYPE[i] = 0; \ + else res.m64_ ## SIGN ## TYPE[i] = (b.m64_i ## TYPE[i] >=0) ? a.m64_ ## SIGN ## TYPE[i] << b.m64_i ## TYPE[i] : a.m64_ ## SIGN ## TYPE[i] >> (-b.m64_i ## TYPE[i]); } \ + return res; + +int8x8_t vshl_s8(int8x8_t a, int8x8_t b); // VSHL.S8 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int8x8_t vshl_s8(int8x8_t a, int8x8_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SHIFT_64(8, i, 8) +} + +int16x4_t vshl_s16(int16x4_t a, int16x4_t b); // VSHL.S16 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int16x4_t vshl_s16(int16x4_t a, int16x4_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SHIFT_64(16, i, 4) +} + +int32x2_t vshl_s32(int32x2_t a, int32x2_t b); // VSHL.S32 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x2_t vshl_s32(int32x2_t a, int32x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SHIFT_64(32, i, 2) +} + +int64x1_t vshl_s64(int64x1_t a, int64x1_t b); // VSHL.S64 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x1_t vshl_s64(int64x1_t a, int64x1_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SHIFT_64(64, i, 1) +} + +uint8x8_t vshl_u8(uint8x8_t a, int8x8_t b); // VSHL.U8 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint8x8_t vshl_u8(uint8x8_t a, int8x8_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SHIFT_64(8, u, 8) +} + +uint16x4_t vshl_u16(uint16x4_t a, int16x4_t b); // VSHL.s16 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint16x4_t vshl_u16(uint16x4_t a, int16x4_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SHIFT_64(16, u, 4) +} + +uint32x2_t vshl_u32(uint32x2_t a, int32x2_t b); // VSHL.U32 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x2_t vshl_u32(uint32x2_t a, int32x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SHIFT_64(32, u, 2) +} + +uint64x1_t vshl_u64(uint64x1_t a, int64x1_t b); // VSHL.U64 d0,d0,d0 +_NEON2SSE_INLINE uint64x1_t vshl_u64(uint64x1_t a, int64x1_t b) //if we use the SERIAL_SHIFT macro need to have the special processing for large numbers +{ + SERIAL_SHIFT_64(64, u, 1) +} + +int8x16_t vshlq_s8(int8x16_t a, int8x16_t b); // VSHL.S8 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int8x16_t vshlq_s8(int8x16_t a, int8x16_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SHIFT(int8_t, int8_t, 16, 16) +} + +int16x8_t vshlq_s16(int16x8_t a, int16x8_t b); // VSHL.S16 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int16x8_t vshlq_s16(int16x8_t a, int16x8_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SHIFT(int16_t, int16_t, 8, 8) +} + +int32x4_t vshlq_s32(int32x4_t a, int32x4_t b); // VSHL.S32 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x4_t vshlq_s32(int32x4_t a, int32x4_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SHIFT(int32_t, int32_t, 4, 4) +} + +int64x2_t vshlq_s64(int64x2_t a, int64x2_t b); // VSHL.S64 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x2_t vshlq_s64(int64x2_t a, int64x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SHIFT(int64_t, int64_t, 2, 2) +} + +uint8x16_t vshlq_u8(uint8x16_t a, int8x16_t b); // VSHL.U8 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint8x16_t vshlq_u8(uint8x16_t a, int8x16_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SHIFT(uint8_t, int8_t, 16, 16) +} + +uint16x8_t vshlq_u16(uint16x8_t a, int16x8_t b); // VSHL.s16 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint16x8_t vshlq_u16(uint16x8_t a, int16x8_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SHIFT(uint16_t, int16_t, 8, 8) +} + +uint32x4_t vshlq_u32(uint32x4_t a, int32x4_t b); // VSHL.U32 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x4_t vshlq_u32(uint32x4_t a, int32x4_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SHIFT(uint32_t, int32_t, 4, 4) +} + +uint64x2_t vshlq_u64(uint64x2_t a, int64x2_t b); // VSHL.U64 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING( uint64x2_t vshlq_u64(uint64x2_t a, int64x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SHIFT(uint64_t, int64_t, 2, 2) +} + + +//*********** Vector saturating shift left: (negative values shift right) ********************** +//******************************************************************************************** +//No such operations in IA32 SIMD available yet, constant shift only available, so need to do the serial solution +#define SERIAL_SATURATING_SHIFT_SIGNED(TYPE, LENMAX, LEN) \ + _NEON2SSE_ALIGN_16 TYPE atmp[LENMAX], res[LENMAX], btmp[LENMAX]; TYPE limit; int i; \ + int lanesize_1 = (sizeof(TYPE) << 3) - 1; \ + _mm_store_si128((__m128i*)atmp, a); _mm_store_si128((__m128i*)btmp, b); \ + for (i = 0; i> (-btmp[i]); \ + else{ \ + if (btmp[i]>lanesize_1) { \ + res[i] = ((_UNSIGNED_T(TYPE))atmp[i] >> lanesize_1 ) + ((TYPE)1 << lanesize_1) - 1; \ + }else{ \ + limit = (TYPE)1 << (lanesize_1 - btmp[i]); \ + if((atmp[i] >= limit)||(atmp[i] <= -limit)) \ + res[i] = ((_UNSIGNED_T(TYPE))atmp[i] >> lanesize_1 ) + ((TYPE)1 << lanesize_1) - 1; \ + else res[i] = atmp[i] << btmp[i]; }}}} \ + return _mm_load_si128((__m128i*)res); + +#define SERIAL_SATURATING_SHIFT_UNSIGNED(TYPE, LENMAX, LEN) \ + _NEON2SSE_ALIGN_16 _UNSIGNED_T(TYPE) atmp[LENMAX], res[LENMAX]; _NEON2SSE_ALIGN_16 TYPE btmp[LENMAX]; _UNSIGNED_T(TYPE) limit; int i; \ + TYPE lanesize = (sizeof(TYPE) << 3); \ + _mm_store_si128((__m128i*)atmp, a); _mm_store_si128((__m128i*)btmp, b); \ + for (i = 0; i> (-btmp[i]); \ + else{ \ + if (btmp[i]>lanesize) res[i] = ~((TYPE)0); \ + else{ \ + limit = (TYPE) 1 << (lanesize - btmp[i]); \ + res[i] = ( atmp[i] >= limit) ? res[i] = ~((TYPE)0) : atmp[i] << btmp[i]; }}}} \ + return _mm_load_si128((__m128i*)res); + +#define SERIAL_SATURATING_SHIFT_SIGNED_64(TYPE, LEN) \ + int ## TYPE ## x ## LEN ## _t res; int ## TYPE ## _t limit; int i; \ + int lanesize_1 = (sizeof( int ## TYPE ## _t) << 3) - 1; \ + for (i = 0; i> (-(b.m64_i ## TYPE[i])); \ + else{ \ + if (b.m64_i ## TYPE[i]>lanesize_1) { \ + res.m64_i ## TYPE[i] = ((_UNSIGNED_T(int ## TYPE ## _t))a.m64_i ## TYPE[i] >> lanesize_1 ) + ((int ## TYPE ## _t) 1 << lanesize_1) - 1; \ + }else{ \ + limit = (int ## TYPE ## _t) 1 << (lanesize_1 - b.m64_i ## TYPE[i]); \ + if((a.m64_i ## TYPE[i] >= limit)||(a.m64_i ## TYPE[i] <= -limit)) \ + res.m64_i ## TYPE[i] = ((_UNSIGNED_T(int ## TYPE ## _t))a.m64_i ## TYPE[i] >> lanesize_1 ) + ((int ## TYPE ## _t) 1 << lanesize_1) - 1; \ + else res.m64_i ## TYPE[i] = a.m64_i ## TYPE[i] << b.m64_i ## TYPE[i]; }}}} \ + return res; + +#define SERIAL_SATURATING_SHIFT_UNSIGNED_64(TYPE, LEN) \ + int ## TYPE ## x ## LEN ## _t res; _UNSIGNED_T(int ## TYPE ## _t) limit; int i; \ + int ## TYPE ## _t lanesize = (sizeof(int ## TYPE ## _t) << 3); \ + for (i = 0; i> (-(b.m64_i ## TYPE[i])); \ + else{ \ + if (b.m64_i ## TYPE[i]>lanesize) res.m64_u ## TYPE[i] = ~((int ## TYPE ## _t) 0); \ + else{ \ + limit = (int ## TYPE ## _t) 1 << (lanesize - b.m64_i ## TYPE[i]); \ + res.m64_u ## TYPE[i] = ( a.m64_u ## TYPE[i] >= limit) ? res.m64_u ## TYPE[i] = ~((int ## TYPE ## _t) 0) : a.m64_u ## TYPE[i] << b.m64_u ## TYPE[i]; }}}} \ + return res; + +int8x8_t vqshl_s8(int8x8_t a, int8x8_t b); // VQSHL.S8 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int8x8_t vqshl_s8(int8x8_t a, int8x8_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_SHIFT_SIGNED_64(8,8) +} + +int16x4_t vqshl_s16(int16x4_t a, int16x4_t b); // VQSHL.S16 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int16x4_t vqshl_s16(int16x4_t a, int16x4_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_SHIFT_SIGNED_64(16,4) +} + +int32x2_t vqshl_s32(int32x2_t a, int32x2_t b); // VQSHL.S32 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x2_t vqshl_s32(int32x2_t a, int32x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_SHIFT_SIGNED_64(32,2) +} + +int64x1_t vqshl_s64(int64x1_t a, int64x1_t b); // VQSHL.S64 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x1_t vqshl_s64(int64x1_t a, int64x1_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_SHIFT_SIGNED_64(64,1) +} + +uint8x8_t vqshl_u8(uint8x8_t a, int8x8_t b); // VQSHL.U8 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint8x8_t vqshl_u8(uint8x8_t a, int8x8_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_SHIFT_UNSIGNED_64(8,8) +} + +uint16x4_t vqshl_u16(uint16x4_t a, int16x4_t b); // VQSHL.s16 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint16x4_t vqshl_u16(uint16x4_t a, int16x4_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_SHIFT_UNSIGNED_64(16,4) +} + +uint32x2_t vqshl_u32(uint32x2_t a, int32x2_t b); // VQSHL.U32 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x2_t vqshl_u32(uint32x2_t a, int32x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_SHIFT_UNSIGNED_64(32,2) +} + +uint64x1_t vqshl_u64(uint64x1_t a, int64x1_t b); // VQSHL.U64 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint64x1_t vqshl_u64(uint64x1_t a, int64x1_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_SHIFT_UNSIGNED_64(64,1) +} + +int8x16_t vqshlq_s8(int8x16_t a, int8x16_t b); // VQSHL.S8 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int8x16_t vqshlq_s8(int8x16_t a, int8x16_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_SHIFT_SIGNED(int8_t, 16, 16) +} + +int16x8_t vqshlq_s16(int16x8_t a, int16x8_t b); // VQSHL.S16 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int16x8_t vqshlq_s16(int16x8_t a, int16x8_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_SHIFT_SIGNED(int16_t, 8, 8) +} + +int32x4_t vqshlq_s32(int32x4_t a, int32x4_t b); // VQSHL.S32 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x4_t vqshlq_s32(int32x4_t a, int32x4_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_SHIFT_SIGNED(int32_t, 4, 4) +} + +int64x2_t vqshlq_s64(int64x2_t a, int64x2_t b); // VQSHL.S64 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x2_t vqshlq_s64(int64x2_t a, int64x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_SHIFT_SIGNED(int64_t, 2, 2) +} + +uint8x16_t vqshlq_u8(uint8x16_t a, int8x16_t b); // VQSHL.U8 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint8x16_t vqshlq_u8(uint8x16_t a, int8x16_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_SHIFT_UNSIGNED(int8_t, 16, 16) +} + +uint16x8_t vqshlq_u16(uint16x8_t a, int16x8_t b); // VQSHL.s16 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint16x8_t vqshlq_u16(uint16x8_t a, int16x8_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_SHIFT_UNSIGNED(int16_t, 8, 8) +} + +uint32x4_t vqshlq_u32(uint32x4_t a, int32x4_t b); // VQSHL.U32 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x4_t vqshlq_u32(uint32x4_t a, int32x4_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_SHIFT_UNSIGNED(int32_t, 4, 4) +} + +uint64x2_t vqshlq_u64(uint64x2_t a, int64x2_t b); // VQSHL.U64 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint64x2_t vqshlq_u64(uint64x2_t a, int64x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_SHIFT_UNSIGNED(int64_t, 2, 2) +} + + +//******** Vector rounding shift left: (negative values shift right) ********** +//**************************************************************************** +//No such operations in IA32 SIMD available yet, constant shift only available, so need to do the serial solution +//rounding makes sense for right shifts only. +#define SERIAL_ROUNDING_SHIFT(TYPE, INTERNAL_TYPE, LENMAX, LEN) \ + _NEON2SSE_ALIGN_16 TYPE atmp[LENMAX], res[LENMAX]; _NEON2SSE_ALIGN_16 INTERNAL_TYPE btmp[LENMAX]; INTERNAL_TYPE i, lanesize = sizeof(INTERNAL_TYPE) << 3; \ + _mm_store_si128((__m128i*)atmp, a); _mm_store_si128((__m128i*)btmp, b); \ + for (i = 0; i= 0) { \ + if(btmp[i] >= lanesize) res[i] = 0; \ + else res[i] = (atmp[i] << btmp[i]); \ + }else{ \ + res[i] = (btmp[i] < -lanesize) ? res[i] = 0 : \ + (btmp[i] == -lanesize) ? (atmp[i] & ((INTERNAL_TYPE)1 << (-btmp[i] - 1))) >> (-btmp[i] - 1) : \ + (atmp[i] >> (-btmp[i])) + ( (atmp[i] & ((INTERNAL_TYPE)1 << (-btmp[i] - 1))) >> (-btmp[i] - 1) ); }} \ + return _mm_load_si128((__m128i*)res); + + +#define SERIAL_ROUNDING_SHIFT_64(TYPE, SIGN, LEN) \ + int ## TYPE ## x ## LEN ## _t res; int i; int lanesize = sizeof(int ## TYPE ## _t) << 3; \ + for (i = 0; i= 0) { \ + if(b.m64_i ## TYPE[i] >= lanesize) res.m64_ ## SIGN ## TYPE[i] = 0; \ + else res.m64_ ## SIGN ## TYPE[i] = (a.m64_ ## SIGN ## TYPE[i] << b.m64_i ## TYPE[i]); \ + }else{ \ + res.m64_ ## SIGN ## TYPE[i] = (b.m64_i ## TYPE[i] < -lanesize) ? res.m64_ ## SIGN ## TYPE[i] = 0 : \ + (b.m64_i ## TYPE[i] == -lanesize) ? (a.m64_ ## SIGN ## TYPE[i] & ((int ## TYPE ## _t) 1 << (-(b.m64_i ## TYPE[i]) - 1))) >> (-(b.m64_i ## TYPE[i]) - 1) : \ + (a.m64_ ## SIGN ## TYPE[i] >> (-(b.m64_i ## TYPE[i]))) + ( (a.m64_ ## SIGN ## TYPE[i] & ((int ## TYPE ## _t) 1 << (-(b.m64_i ## TYPE[i]) - 1))) >> (-(b.m64_i ## TYPE[i]) - 1) ); }} \ + return res; + + +int8x8_t vrshl_s8(int8x8_t a, int8x8_t b); // VRSHL.S8 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int8x8_t vrshl_s8(int8x8_t a, int8x8_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_ROUNDING_SHIFT_64(8,i,8) +} + +int16x4_t vrshl_s16(int16x4_t a, int16x4_t b); // VRSHL.S16 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int16x4_t vrshl_s16(int16x4_t a, int16x4_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_ROUNDING_SHIFT_64(16,i,4) +} + +int32x2_t vrshl_s32(int32x2_t a, int32x2_t b); // VRSHL.S32 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x2_t vrshl_s32(int32x2_t a, int32x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_ROUNDING_SHIFT_64(32,i,2) +} + +int64x1_t vrshl_s64(int64x1_t a, int64x1_t b); // VRSHL.S64 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x1_t vrshl_s64(int64x1_t a, int64x1_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_ROUNDING_SHIFT_64(64,i,1) +} + +uint8x8_t vrshl_u8(uint8x8_t a, int8x8_t b); // VRSHL.U8 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint8x8_t vrshl_u8(uint8x8_t a, int8x8_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_ROUNDING_SHIFT_64(8,u,8) +} + +uint16x4_t vrshl_u16(uint16x4_t a, int16x4_t b); // VRSHL.s16 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint16x4_t vrshl_u16(uint16x4_t a, int16x4_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_ROUNDING_SHIFT_64(16,u,4) +} + +uint32x2_t vrshl_u32(uint32x2_t a, int32x2_t b); // VRSHL.U32 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x2_t vrshl_u32(uint32x2_t a, int32x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_ROUNDING_SHIFT_64(32,u,2) +} + +uint64x1_t vrshl_u64(uint64x1_t a, int64x1_t b); // VRSHL.U64 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint64x1_t vrshl_u64(uint64x1_t a, int64x1_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_ROUNDING_SHIFT_64(64,u,1) +} + +int8x16_t vrshlq_s8(int8x16_t a, int8x16_t b); // VRSHL.S8 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int8x16_t vrshlq_s8(int8x16_t a, int8x16_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_ROUNDING_SHIFT(int8_t, int8_t, 16, 16) +} + +int16x8_t vrshlq_s16(int16x8_t a, int16x8_t b); // VRSHL.S16 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int16x8_t vrshlq_s16(int16x8_t a, int16x8_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_ROUNDING_SHIFT(int16_t, int16_t, 8, 8) +} + +int32x4_t vrshlq_s32(int32x4_t a, int32x4_t b); // VRSHL.S32 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x4_t vrshlq_s32(int32x4_t a, int32x4_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_ROUNDING_SHIFT(int32_t, int32_t, 4, 4) +} + +int64x2_t vrshlq_s64(int64x2_t a, int64x2_t b); // VRSHL.S64 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x2_t vrshlq_s64(int64x2_t a, int64x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_ROUNDING_SHIFT(int64_t, int64_t, 2, 2) +} + +uint8x16_t vrshlq_u8(uint8x16_t a, int8x16_t b); // VRSHL.U8 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint8x16_t vrshlq_u8(uint8x16_t a, int8x16_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_ROUNDING_SHIFT(uint8_t, int8_t, 16, 16) +} + +uint16x8_t vrshlq_u16(uint16x8_t a, int16x8_t b); // VRSHL.s16 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint16x8_t vrshlq_u16(uint16x8_t a, int16x8_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_ROUNDING_SHIFT(uint16_t, int16_t, 8, 8) +} + +uint32x4_t vrshlq_u32(uint32x4_t a, int32x4_t b); // VRSHL.U32 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x4_t vrshlq_u32(uint32x4_t a, int32x4_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_ROUNDING_SHIFT(uint32_t, int32_t, 4, 4) +} + +uint64x2_t vrshlq_u64(uint64x2_t a, int64x2_t b); // VRSHL.U64 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint64x2_t vrshlq_u64(uint64x2_t a, int64x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_ROUNDING_SHIFT(uint64_t, int64_t, 2, 2) +} + + +//********** Vector saturating rounding shift left: (negative values shift right) **************** +//************************************************************************************************* +//No such operations in IA32 SIMD unfortunately, constant shift only available, so need to do the serial solution +//Saturation happens for left shifts only while rounding makes sense for right shifts only. +#define SERIAL_SATURATING_ROUNDING_SHIFT_SIGNED(TYPE, LENMAX, LEN) \ + _NEON2SSE_ALIGN_16 TYPE atmp[LENMAX], res[LENMAX], btmp[LENMAX]; TYPE limit; int i; \ + int lanesize_1 = (sizeof(TYPE) << 3) - 1; \ + _mm_store_si128((__m128i*)atmp, a); _mm_store_si128((__m128i*)btmp, b); \ + for (i = 0; i> (-btmp[i])) + ( (atmp[i] & ((TYPE)1 << (-btmp[i] - 1))) >> (-btmp[i] - 1) ); \ + else{ \ + if (btmp[i]>lanesize_1) { \ + res[i] = ((_UNSIGNED_T(TYPE))atmp[i] >> lanesize_1 ) + ((TYPE)1 << lanesize_1) - 1; \ + }else{ \ + limit = (TYPE)1 << (lanesize_1 - btmp[i]); \ + if((atmp[i] >= limit)||(atmp[i] <= -limit)) \ + res[i] = ((_UNSIGNED_T(TYPE))atmp[i] >> lanesize_1 ) + ((TYPE)1 << lanesize_1) - 1; \ + else res[i] = atmp[i] << btmp[i]; }}}} \ + return _mm_load_si128((__m128i*)res); + +#define SERIAL_SATURATING_ROUNDING_SHIFT_UNSIGNED(TYPE, LENMAX, LEN) \ + _NEON2SSE_ALIGN_16 _UNSIGNED_T(TYPE) atmp[LENMAX], res[LENMAX]; _NEON2SSE_ALIGN_16 TYPE btmp[LENMAX]; _UNSIGNED_T(TYPE) limit; int i; \ + int lanesize = (sizeof(TYPE) << 3); \ + _mm_store_si128((__m128i*)atmp, a); _mm_store_si128((__m128i*)btmp, b); \ + for (i = 0; i> (-btmp[i])) + ( (atmp[i] & ((TYPE)1 << (-btmp[i] - 1))) >> (-btmp[i] - 1) ); \ + else{ \ + if (btmp[i]>lanesize) res[i] = ~((TYPE)0); \ + else{ \ + limit = (TYPE) 1 << (lanesize - btmp[i]); \ + res[i] = ( atmp[i] >= limit) ? res[i] = ~((TYPE)0) : atmp[i] << btmp[i]; }}}} \ + return _mm_load_si128((__m128i*)res); + +#define SERIAL_SATURATING_ROUNDING_SHIFT_SIGNED_64(TYPE, LEN) \ + __m64_128 res; int ## TYPE ## _t limit; int i; \ + int lanesize_1 = (sizeof(int ## TYPE ## _t ) << 3) - 1; \ + for (i = 0; i> (-(b.m64_i ## TYPE[i]))) + ( (a.m64_i ## TYPE[i] & ((int ## TYPE ## _t ) 1 << (-(b.m64_i ## TYPE[i]) - 1))) >> (-(b.m64_i ## TYPE[i]) - 1) ); \ + else{ \ + if (b.m64_i ## TYPE[i]>lanesize_1) { \ + res.m64_i ## TYPE[i] = ((_UNSIGNED_T(int ## TYPE ## _t ))a.m64_i ## TYPE[i] >> lanesize_1 ) + ((int ## TYPE ## _t ) 1 << lanesize_1) - 1; \ + }else{ \ + limit = (int ## TYPE ## _t ) 1 << (lanesize_1 - b.m64_i ## TYPE[i]); \ + if((a.m64_i ## TYPE[i] >= limit)||(a.m64_i ## TYPE[i] <= -limit)) \ + res.m64_i ## TYPE[i] = ((_UNSIGNED_T(int ## TYPE ## _t ))a.m64_i ## TYPE[i] >> lanesize_1 ) + ((int ## TYPE ## _t ) 1 << lanesize_1) - 1; \ + else res.m64_i ## TYPE[i] = a.m64_i ## TYPE[i] << b.m64_i ## TYPE[i]; }}}} \ + return res; + +#define SERIAL_SATURATING_ROUNDING_SHIFT_UNSIGNED_64(TYPE, LEN) \ + __m64_128 res; _UNSIGNED_T(int ## TYPE ## _t) limit; int i; \ + int lanesize = (sizeof(int ## TYPE ## _t) << 3); \ + for (i = 0; i> (-(b.m64_i ## TYPE[i]))) + ( (a.m64_u ## TYPE[i] & ((int ## TYPE ## _t) 1 << (-(b.m64_i ## TYPE[i]) - 1))) >> (-(b.m64_i ## TYPE[i]) - 1) ); \ + else{ \ + if (b.m64_i ## TYPE[i]>lanesize) res.m64_u ## TYPE[i] = ~((int ## TYPE ## _t) 0); \ + else{ \ + limit = (int ## TYPE ## _t) 1 << (lanesize - b.m64_i ## TYPE[i]); \ + res.m64_u ## TYPE[i] = ( a.m64_u ## TYPE[i] >= limit) ? res.m64_u ## TYPE[i] = ~((int ## TYPE ## _t) 0) : a.m64_u ## TYPE[i] << b.m64_i ## TYPE[i]; }}}} \ + return res; + +int8x8_t vqrshl_s8(int8x8_t a, int8x8_t b); // VQRSHL.S8 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int8x8_t vqrshl_s8(int8x8_t a, int8x8_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_ROUNDING_SHIFT_SIGNED_64(8,8) +} + +int16x4_t vqrshl_s16(int16x4_t a, int16x4_t b); // VQRSHL.S16 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int16x4_t vqrshl_s16(int16x4_t a, int16x4_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_ROUNDING_SHIFT_SIGNED_64(16,4) +} + +int32x2_t vqrshl_s32(int32x2_t a, int32x2_t b); // VQRSHL.S32 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x2_t vqrshl_s32(int32x2_t a, int32x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_ROUNDING_SHIFT_SIGNED_64(32,2) +} + +int64x1_t vqrshl_s64(int64x1_t a, int64x1_t b); // VQRSHL.S64 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x1_t vqrshl_s64(int64x1_t a, int64x1_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_ROUNDING_SHIFT_SIGNED_64(64,1) +} + +uint8x8_t vqrshl_u8(uint8x8_t a, int8x8_t b); // VQRSHL.U8 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint8x8_t vqrshl_u8(uint8x8_t a, int8x8_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_ROUNDING_SHIFT_UNSIGNED_64(8,8) +} + +uint16x4_t vqrshl_u16(uint16x4_t a, int16x4_t b); // VQRSHL.s16 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint16x4_t vqrshl_u16(uint16x4_t a, int16x4_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_ROUNDING_SHIFT_UNSIGNED_64(16,4) +} + +uint32x2_t vqrshl_u32(uint32x2_t a, int32x2_t b); // VQRSHL.U32 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x2_t vqrshl_u32(uint32x2_t a, int32x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_ROUNDING_SHIFT_UNSIGNED_64(32,2) +} + +uint64x1_t vqrshl_u64(uint64x1_t a, int64x1_t b); // VQRSHL.U64 d0,d0,d0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint64x1_t vqrshl_u64(uint64x1_t a, int64x1_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_ROUNDING_SHIFT_UNSIGNED_64(64,1) +} + +int8x16_t vqrshlq_s8(int8x16_t a, int8x16_t b); // VQRSHL.S8 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int8x16_t vqrshlq_s8(int8x16_t a, int8x16_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_ROUNDING_SHIFT_SIGNED(int8_t, 16, 16) +} + +int16x8_t vqrshlq_s16(int16x8_t a, int16x8_t b); // VQRSHL.S16 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int16x8_t vqrshlq_s16(int16x8_t a, int16x8_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_ROUNDING_SHIFT_SIGNED(int16_t, 8, 8) +} + +int32x4_t vqrshlq_s32(int32x4_t a, int32x4_t b); // VQRSHL.S32 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x4_t vqrshlq_s32(int32x4_t a, int32x4_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_ROUNDING_SHIFT_SIGNED(int32_t, 4, 4) +} + +int64x2_t vqrshlq_s64(int64x2_t a, int64x2_t b); // VQRSHL.S64 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x2_t vqrshlq_s64(int64x2_t a, int64x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_ROUNDING_SHIFT_SIGNED(int64_t, 2, 2) +} + +uint8x16_t vqrshlq_u8(uint8x16_t a, int8x16_t b); // VQRSHL.U8 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint8x16_t vqrshlq_u8(uint8x16_t a, int8x16_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_ROUNDING_SHIFT_UNSIGNED(int8_t, 16, 16) +} + +uint16x8_t vqrshlq_u16(uint16x8_t a, int16x8_t b); // VQRSHL.s16 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint16x8_t vqrshlq_u16(uint16x8_t a, int16x8_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_ROUNDING_SHIFT_UNSIGNED(int16_t, 8, 8) +} + +uint32x4_t vqrshlq_u32(uint32x4_t a, int32x4_t b); // VQRSHL.U32 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x4_t vqrshlq_u32(uint32x4_t a, int32x4_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_ROUNDING_SHIFT_UNSIGNED(int32_t, 4, 4) +} + +uint64x2_t vqrshlq_u64(uint64x2_t a, int64x2_t b); // VQRSHL.U64 q0,q0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint64x2_t vqrshlq_u64(uint64x2_t a, int64x2_t b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + SERIAL_SATURATING_ROUNDING_SHIFT_UNSIGNED(int64_t, 2, 2) +} + +// ********************************************************************************* +// ***************************** Shifts by a constant ***************************** +// ********************************************************************************* +//**************** Vector shift right by constant************************************* +//************************************************************************************ +int8x8_t vshr_n_s8(int8x8_t a, __constrange(1,8) int b); // VSHR.S8 d0,d0,#8 +_NEON2SSE_INLINE int8x8_t vshr_n_s8(int8x8_t a, __constrange(1,8) int b) // VSHR.S8 d0,d0,#8 +{ + //no 8 bit shift available, go to 16 bit + int8x8_t res64; + __m128i r; + r = _MM_CVTEPI8_EPI16 (_pM128i(a)); //SSE 4.1 + r = _mm_srai_epi16 (r, b); //SSE2 + r = _mm_packs_epi16 (r,r); //we need 64 bits only + return64(r); +} + +int16x4_t vshr_n_s16(int16x4_t a, __constrange(1,16) int b); // VSHR.S16 d0,d0,#16 +_NEON2SSE_INLINE int16x4_t vshr_n_s16(int16x4_t a, __constrange(1,16) int b) +{ + int16x4_t res64; + return64(_mm_srai_epi16(_pM128i(a), b)); +} + + +int32x2_t vshr_n_s32(int32x2_t a, __constrange(1,32) int b); // VSHR.S32 d0,d0,#32 +_NEON2SSE_INLINE int32x2_t vshr_n_s32(int32x2_t a, __constrange(1,32) int b) +{ + int32x2_t res64; + return64(_mm_srai_epi32(_pM128i(a), b)); +} + +int64x1_t vshr_n_s64(int64x1_t a, __constrange(1,64) int b); // VSHR.S64 d0,d0,#64 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x1_t vshr_n_s64(int64x1_t a, __constrange(1,64) int b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + //no arithmetic shift for 64bit values, serial solution used + int64x1_t res; + if(b>=64) res.m64_i64[0] = 0; + else res.m64_i64[0] = (*(int64_t*)&a) >> b; + return res; +} + +uint8x8_t vshr_n_u8(uint8x8_t a, __constrange(1,8) int b); // VSHR.U8 d0,d0,#8 +_NEON2SSE_INLINE uint8x8_t vshr_n_u8(uint8x8_t a, __constrange(1,8) int b) // VSHR.U8 d0,d0,#8 +{ + //no 8 bit shift available, go to 16 bit + uint8x8_t res64; + __m128i r; + r = _MM_CVTEPU8_EPI16 (_pM128i(a)); //SSE 4.1 + r = _mm_srli_epi16 (r, b); //for unsigned variables we use the logical shift not arithmetical one + r = _mm_packus_epi16 (r,r); //we need 64 bits only + return64(r); +} + +uint16x4_t vshr_n_u16(uint16x4_t a, __constrange(1,16) int b); // VSHR.s16 d0,d0,#16 +_NEON2SSE_INLINE uint16x4_t vshr_n_u16(uint16x4_t a, __constrange(1,16) int b) +{ + uint16x4_t res64; + return64(_mm_srli_epi16(_pM128i(a), b)); +} + + +uint32x2_t vshr_n_u32(uint32x2_t a, __constrange(1,32) int b); // VSHR.U32 d0,d0,#32 +_NEON2SSE_INLINE uint32x2_t vshr_n_u32(uint32x2_t a, __constrange(1,32) int b) +{ + uint32x2_t res64; + return64(_mm_srli_epi32(_pM128i(a), b)); +} + + +uint64x1_t vshr_n_u64(uint64x1_t a, __constrange(1,64) int b); // VSHR.U64 d0,d0,#64 +_NEON2SSE_INLINE uint64x1_t vshr_n_u64(uint64x1_t a, __constrange(1,64) int b) +{ + uint64x1_t res64; + return64(_mm_srli_epi64(_pM128i(a), b)); +} + + +int8x16_t vshrq_n_s8(int8x16_t a, __constrange(1,8) int b); // VSHR.S8 q0,q0,#8 +_NEON2SSE_INLINE int8x16_t vshrq_n_s8(int8x16_t a, __constrange(1,8) int b) // VSHR.S8 q0,q0,#8 +{ + //no 8 bit shift available, go to 16 bit trick + __m128i zero, mask0, a_sign, r, a_sign_mask; + _NEON2SSE_ALIGN_16 int16_t mask0_16[9] = {0x0000, 0x0080, 0x00c0, 0x00e0, 0x00f0, 0x00f8, 0x00fc, 0x00fe, 0x00ff}; + zero = _mm_setzero_si128(); + mask0 = _mm_set1_epi16(mask0_16[b]); //to mask the bits to be "spoiled" by 16 bit shift + a_sign = _mm_cmpgt_epi8 (zero, a); //ff if a<0 or zero if a>0 + r = _mm_srai_epi16 (a, b); + a_sign_mask = _mm_and_si128 (mask0, a_sign); + r = _mm_andnot_si128 (mask0, r); + return _mm_or_si128 (r, a_sign_mask); +} + +int16x8_t vshrq_n_s16(int16x8_t a, __constrange(1,16) int b); // VSHR.S16 q0,q0,#16 +#define vshrq_n_s16 _mm_srai_epi16 + +int32x4_t vshrq_n_s32(int32x4_t a, __constrange(1,32) int b); // VSHR.S32 q0,q0,#32 +#define vshrq_n_s32 _mm_srai_epi32 + +int64x2_t vshrq_n_s64(int64x2_t a, __constrange(1,64) int b); // VSHR.S64 q0,q0,#64 +_NEON2SSE_INLINE int64x2_t vshrq_n_s64(int64x2_t a, __constrange(1,64) int b) +{ + //SIMD implementation may be not optimal due to 64 bit arithmetic shift absense in x86 SIMD + __m128i c1, signmask,a0, res64; + _NEON2SSE_ALIGN_16 uint64_t mask[] = {0x8000000000000000, 0x8000000000000000}; + c1 = _mm_cmpeq_epi32(a,a); //0xffffffffffffffff + signmask = _mm_slli_epi64 (c1, (64 - b)); + a0 = _mm_or_si128(a, *(__m128i*)mask); //get the first bit + a0 = _MM_CMPEQ_EPI64 (a, a0); + signmask = _mm_and_si128(a0, signmask); + res64 = _mm_srli_epi64 (a, b); + return _mm_or_si128(res64, signmask); +} + +uint8x16_t vshrq_n_u8(uint8x16_t a, __constrange(1,8) int b); // VSHR.U8 q0,q0,#8 +_NEON2SSE_INLINE uint8x16_t vshrq_n_u8(uint8x16_t a, __constrange(1,8) int b) // VSHR.U8 q0,q0,#8 +{ + //no 8 bit shift available, need the special trick + __m128i mask0, r; + _NEON2SSE_ALIGN_16 uint16_t mask10_16[9] = {0xffff, 0xff7f, 0xff3f, 0xff1f, 0xff0f, 0xff07, 0xff03, 0xff01, 0xff00}; + mask0 = _mm_set1_epi16(mask10_16[b]); //to mask the bits to be "spoiled" by 16 bit shift + r = _mm_srli_epi16 ( a, b); + return _mm_and_si128 (r, mask0); +} + +uint16x8_t vshrq_n_u16(uint16x8_t a, __constrange(1,16) int b); // VSHR.s16 q0,q0,#16 +#define vshrq_n_u16 _mm_srli_epi16 + +uint32x4_t vshrq_n_u32(uint32x4_t a, __constrange(1,32) int b); // VSHR.U32 q0,q0,#32 +#define vshrq_n_u32 _mm_srli_epi32 + +uint64x2_t vshrq_n_u64(uint64x2_t a, __constrange(1,64) int b); // VSHR.U64 q0,q0,#64 +#define vshrq_n_u64 _mm_srli_epi64 + +//*************************** Vector shift left by constant ************************* +//********************************************************************************* +int8x8_t vshl_n_s8(int8x8_t a, __constrange(0,7) int b); // VSHL.I8 d0,d0,#0 +_NEON2SSE_INLINE int8x8_t vshl_n_s8(int8x8_t a, __constrange(0,7) int b) // VSHL.I8 d0,d0,#0 +{ + //no 8 bit shift available, go to 16 bit + int8x8_t res64; + __m128i r; + _NEON2SSE_ALIGN_16 int8_t mask8_16_even_odd[16] = { 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15 }; + r = _MM_CVTEPI8_EPI16 (_pM128i(a)); //SSE 4.1 + r = _mm_slli_epi16 (r, b); //SSE2 + r = _mm_shuffle_epi8 (r, *(__m128i*) mask8_16_even_odd); //return to 8 bit, we need 64 bits only + return64(r); +} + +int16x4_t vshl_n_s16(int16x4_t a, __constrange(0,15) int b); // VSHL.I16 d0,d0,#0 +_NEON2SSE_INLINE int16x4_t vshl_n_s16(int16x4_t a, __constrange(0,15) int b) +{ + int16x4_t res64; + return64(_mm_slli_epi16(_pM128i(a), b)); +} + + +int32x2_t vshl_n_s32(int32x2_t a, __constrange(0,31) int b); // VSHL.I32 d0,d0,#0 +_NEON2SSE_INLINE int32x2_t vshl_n_s32(int32x2_t a, __constrange(0,31) int b) +{ + int32x2_t res64; + return64(_mm_slli_epi32(_pM128i(a), b)); +} + + +int64x1_t vshl_n_s64(int64x1_t a, __constrange(0,63) int b); // VSHL.I64 d0,d0,#0 +_NEON2SSE_INLINE int64x1_t vshl_n_s64(int64x1_t a, __constrange(0,63) int b) +{ + int64x1_t res64; + return64(_mm_slli_epi64(_pM128i(a), b)); +} + + +uint8x8_t vshl_n_u8(uint8x8_t a, __constrange(0,7) int b); // VSHL.I8 d0,d0,#0 +_NEON2SSE_INLINE uint8x8_t vshl_n_u8(uint8x8_t a, __constrange(0,7) int b) +{ + //no 8 bit shift available, go to 16 bit + uint8x8_t res64; + __m128i mask8; + __m128i r; + mask8 = _mm_set1_epi16(0xff); + r = _MM_CVTEPU8_EPI16 (_pM128i(a)); //SSE 4.1 + r = _mm_slli_epi16 (r, b); //SSE2 + r = _mm_and_si128(r, mask8); //to avoid saturation + r = _mm_packus_epi16 (r,r); //we need 64 bits only + return64(r); +} + +uint16x4_t vshl_n_u16(uint16x4_t a, __constrange(0,15) int b); // VSHL.I16 d0,d0,#0 +#define vshl_n_u16 vshl_n_s16 + + +uint32x2_t vshl_n_u32(uint32x2_t a, __constrange(0,31) int b); // VSHL.I32 d0,d0,#0 +#define vshl_n_u32 vshl_n_s32 + +uint64x1_t vshl_n_u64(uint64x1_t a, __constrange(0,63) int b); // VSHL.I64 d0,d0,#0 +#define vshl_n_u64 vshl_n_s64 + +int8x16_t vshlq_n_s8(int8x16_t a, __constrange(0,7) int b); // VSHL.I8 q0,q0,#0 +#define vshlq_n_s8 vshlq_n_u8 + +int16x8_t vshlq_n_s16(int16x8_t a, __constrange(0,15) int b); // VSHL.I16 q0,q0,#0 +#define vshlq_n_s16 _mm_slli_epi16 + +int32x4_t vshlq_n_s32(int32x4_t a, __constrange(0,31) int b); // VSHL.I32 q0,q0,#0 +#define vshlq_n_s32 _mm_slli_epi32 + +int64x2_t vshlq_n_s64(int64x2_t a, __constrange(0,63) int b); // VSHL.I64 q0,q0,#0 +#define vshlq_n_s64 _mm_slli_epi64 + +uint8x16_t vshlq_n_u8(uint8x16_t a, __constrange(0,7) int b); // VSHL.I8 q0,q0,#0 +_NEON2SSE_INLINE uint8x16_t vshlq_n_u8(uint8x16_t a, __constrange(0,7) int b) +{ + //no 8 bit shift available, need the special trick + __m128i mask0, r; + _NEON2SSE_ALIGN_16 uint16_t mask10_16[9] = {0xffff, 0xfeff, 0xfcff, 0xf8ff, 0xf0ff, 0xe0ff, 0xc0ff, 0x80ff, 0xff}; + mask0 = _mm_set1_epi16(mask10_16[b]); //to mask the bits to be "spoiled" by 16 bit shift + r = _mm_slli_epi16 ( a, b); + return _mm_and_si128 (r, mask0); +} + +uint16x8_t vshlq_n_u16(uint16x8_t a, __constrange(0,15) int b); // VSHL.I16 q0,q0,#0 +#define vshlq_n_u16 vshlq_n_s16 + +uint32x4_t vshlq_n_u32(uint32x4_t a, __constrange(0,31) int b); // VSHL.I32 q0,q0,#0 +#define vshlq_n_u32 vshlq_n_s32 + +uint64x2_t vshlq_n_u64(uint64x2_t a, __constrange(0,63) int b); // VSHL.I64 q0,q0,#0 +#define vshlq_n_u64 vshlq_n_s64 + +//************* Vector rounding shift right by constant ****************** +//************************************************************************* +//No corresponding x86 intrinsics exist, need to do some tricks +int8x8_t vrshr_n_s8(int8x8_t a, __constrange(1,8) int b); // VRSHR.S8 d0,d0,#8 +_NEON2SSE_INLINE int8x8_t vrshr_n_s8(int8x8_t a, __constrange(1,8) int b) // VRSHR.S8 d0,d0,#8 +{ + //no 8 bit shift available, go to 16 bit + int8x8_t res64; + __m128i r, maskb; + r = _MM_CVTEPI8_EPI16 (_pM128i(a)); //SSE 4.1 + maskb = _mm_slli_epi16 (r, (16 - b)); //to get rounding (b-1)th bit + maskb = _mm_srli_epi16 (maskb, 15); //1 or 0 + r = _mm_srai_epi16 (r, b); + r = _mm_add_epi16 (r, maskb); //actual rounding + r = _mm_packs_epi16 (r,r); ////we need 64 bits only + return64(r); +} + +int16x4_t vrshr_n_s16(int16x4_t a, __constrange(1,16) int b); // VRSHR.S16 d0,d0,#16 +_NEON2SSE_INLINE int16x4_t vrshr_n_s16(int16x4_t a, __constrange(1,16) int b) +{ + int16x4_t res64; + return64(vrshrq_n_s16(_pM128i(a), b)); +} + + +int32x2_t vrshr_n_s32(int32x2_t a, __constrange(1,32) int b); // VRSHR.S32 d0,d0,#32 +_NEON2SSE_INLINE int32x2_t vrshr_n_s32(int32x2_t a, __constrange(1,32) int b) +{ + int32x2_t res64; + return64(vrshrq_n_s32(_pM128i(a), b)); +} + + +int64x1_t vrshr_n_s64(int64x1_t a, __constrange(1,64) int b); // VRSHR.S64 d0,d0,#64 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x1_t vrshr_n_s64(int64x1_t a, __constrange(1,64) int b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + //serial solution is faster + int64x1_t res; + int64_t a_i64 = *( int64_t*)&a; + if(b==64) { + res.m64_i64[0] = 0; //for some compilers rounding happens and we need to use(a_i64 & _SIGNBIT64)>>63; + } else { + int64_t maskb = a_i64 & (( int64_t)1 << (b - 1)); + res.m64_i64[0] = (a_i64 >> b) + (maskb >> (b - 1)); + } + return res; +} + +uint8x8_t vrshr_n_u8(uint8x8_t a, __constrange(1,8) int b); // VRSHR.U8 d0,d0,#8 +_NEON2SSE_INLINE uint8x8_t vrshr_n_u8(uint8x8_t a, __constrange(1,8) int b) // VRSHR.U8 d0,d0,#8 +{ + //no 8 bit shift available, go to 16 bit, solution may be not optimal compared with the serial one + uint8x8_t res64; + __m128i r, maskb; + r = _MM_CVTEPU8_EPI16 (_pM128i(a)); //SSE 4.1 + maskb = _mm_slli_epi16 (r, (16 - b)); //to get rounding (b-1)th bit + maskb = _mm_srli_epi16 (maskb, 15); //1 or 0 + r = _mm_srli_epi16 (r, b); + r = _mm_add_epi16 (r, maskb); //actual rounding + r = _mm_packus_epi16 (r,r); ////we need 64 bits only + return64(r); +} + +uint16x4_t vrshr_n_u16(uint16x4_t a, __constrange(1,16) int b); // VRSHR.s16 d0,d0,#16 +_NEON2SSE_INLINE uint16x4_t vrshr_n_u16(uint16x4_t a, __constrange(1,16) int b) +{ + uint16x4_t res64; + return64(vrshrq_n_u16(_pM128i(a), b)); +} + + +uint32x2_t vrshr_n_u32(uint32x2_t a, __constrange(1,32) int b); // VRSHR.U32 d0,d0,#32 +_NEON2SSE_INLINE uint32x2_t vrshr_n_u32(uint32x2_t a, __constrange(1,32) int b) +{ + uint32x2_t res64; + return64(vrshrq_n_u32(_pM128i(a), b)); +} + + +uint64x1_t vrshr_n_u64(uint64x1_t a, __constrange(1,64) int b); // VRSHR.U64 d0,d0,#64 +_NEON2SSE_INLINE uint64x1_t vrshr_n_u64(uint64x1_t a, __constrange(1,64) int b) +{ + uint64x1_t res64; + return64(vrshrq_n_u64(_pM128i(a), b)); +} + +int8x16_t vrshrq_n_s8(int8x16_t a, __constrange(1,8) int b); // VRSHR.S8 q0,q0,#8 +_NEON2SSE_INLINE int8x16_t vrshrq_n_s8(int8x16_t a, __constrange(1,8) int b) // VRSHR.S8 q0,q0,#8 +{ + //no 8 bit shift available, go to 16 bit trick + __m128i r, mask1, maskb; + _NEON2SSE_ALIGN_16 uint16_t mask2b[9] = {0x0000, 0x0101, 0x0202, 0x0404, 0x0808, 0x1010, 0x2020, 0x4040, 0x8080}; // 2^b-th bit set to 1 + r = vshrq_n_s8 (a, b); + mask1 = _mm_set1_epi16(mask2b[b]); // 2^b-th bit set to 1 for 16bit, need it for rounding + maskb = _mm_and_si128(a, mask1); //get b or 0 for rounding + maskb = _mm_srli_epi16 (maskb, b - 1); // to add 1 + return _mm_add_epi8(r, maskb); //actual rounding +} + +int16x8_t vrshrq_n_s16(int16x8_t a, __constrange(1,16) int b); // VRSHR.S16 q0,q0,#16 +_NEON2SSE_INLINE int16x8_t vrshrq_n_s16(int16x8_t a, __constrange(1,16) int b) // VRSHR.S16 q0,q0,#16 +{ + __m128i maskb, r; + maskb = _mm_slli_epi16(a, (16 - b)); //to get rounding (b-1)th bit + maskb = _mm_srli_epi16(maskb, 15); //1 or 0 + r = _mm_srai_epi16 (a, b); + return _mm_add_epi16 (r, maskb); //actual rounding +} + +int32x4_t vrshrq_n_s32(int32x4_t a, __constrange(1,32) int b); // VRSHR.S32 q0,q0,#32 +_NEON2SSE_INLINE int32x4_t vrshrq_n_s32(int32x4_t a, __constrange(1,32) int b) // VRSHR.S32 q0,q0,#32 +{ + __m128i maskb, r; + maskb = _mm_slli_epi32 (a, (32 - b)); //to get rounding (b-1)th bit + maskb = _mm_srli_epi32 (maskb,31); //1 or 0 + r = _mm_srai_epi32(a, b); + return _mm_add_epi32 (r, maskb); //actual rounding +} + +int64x2_t vrshrq_n_s64(int64x2_t a, __constrange(1,64) int b); // VRSHR.S64 q0,q0,#64 +_NEON2SSE_INLINE int64x2_t vrshrq_n_s64(int64x2_t a, __constrange(1,64) int b) +{ + //solution may be not optimal compared with a serial one + __m128i maskb; + int64x2_t r; + maskb = _mm_slli_epi64 (a, (64 - b)); //to get rounding (b-1)th bit + maskb = _mm_srli_epi64 (maskb,63); //1 or 0 + r = vshrq_n_s64(a, b); + return _mm_add_epi64 (r, maskb); //actual rounding +} + +uint8x16_t vrshrq_n_u8(uint8x16_t a, __constrange(1,8) int b); // VRSHR.U8 q0,q0,#8 +_NEON2SSE_INLINE uint8x16_t vrshrq_n_u8(uint8x16_t a, __constrange(1,8) int b) // VRSHR.U8 q0,q0,#8 +{ + //no 8 bit shift available, go to 16 bit trick + __m128i r, mask1, maskb; + _NEON2SSE_ALIGN_16 uint16_t mask2b[9] = {0x0000, 0x0101, 0x0202, 0x0404, 0x0808, 0x1010, 0x2020, 0x4040, 0x8080}; // 2^b-th bit set to 1 + r = vshrq_n_u8 (a, b); + mask1 = _mm_set1_epi16(mask2b[b]); // 2^b-th bit set to 1 for 16bit, need it for rounding + maskb = _mm_and_si128(a, mask1); //get b or 0 for rounding + maskb = _mm_srli_epi16 (maskb, b - 1); // to add 1 + return _mm_add_epi8(r, maskb); //actual rounding +} + +uint16x8_t vrshrq_n_u16(uint16x8_t a, __constrange(1,16) int b); // VRSHR.s16 q0,q0,#16 +_NEON2SSE_INLINE uint16x8_t vrshrq_n_u16(uint16x8_t a, __constrange(1,16) int b) // VRSHR.S16 q0,q0,#16 +{ + __m128i maskb, r; + maskb = _mm_slli_epi16(a, (16 - b)); //to get rounding (b-1)th bit + maskb = _mm_srli_epi16(maskb, 15); //1 or 0 + r = _mm_srli_epi16 (a, b); + return _mm_add_epi16 (r, maskb); //actual rounding +} + +uint32x4_t vrshrq_n_u32(uint32x4_t a, __constrange(1,32) int b); // VRSHR.U32 q0,q0,#32 +_NEON2SSE_INLINE uint32x4_t vrshrq_n_u32(uint32x4_t a, __constrange(1,32) int b) // VRSHR.S32 q0,q0,#32 +{ + __m128i maskb, r; + maskb = _mm_slli_epi32 (a, (32 - b)); //to get rounding (b-1)th bit + maskb = _mm_srli_epi32 (maskb,31); //1 or 0 + r = _mm_srli_epi32(a, b); + return _mm_add_epi32 (r, maskb); //actual rounding +} + +uint64x2_t vrshrq_n_u64(uint64x2_t a, __constrange(1,64) int b); // VRSHR.U64 q0,q0,#64 +_NEON2SSE_INLINE uint64x2_t vrshrq_n_u64(uint64x2_t a, __constrange(1,64) int b) +{ + //solution may be not optimal compared with a serial one + __m128i maskb, r; + maskb = _mm_slli_epi64 (a, (64 - b)); //to get rounding (b-1)th bit + maskb = _mm_srli_epi64 (maskb,63); //1 or 0 + r = _mm_srli_epi64(a, b); + return _mm_add_epi64 (r, maskb); //actual rounding +} + +//************* Vector shift right by constant and accumulate ********* +//********************************************************************* +int8x8_t vsra_n_s8(int8x8_t a, int8x8_t b, __constrange(1,8) int c); // VSRA.S8 d0,d0,#8 +_NEON2SSE_INLINE int8x8_t vsra_n_s8(int8x8_t a, int8x8_t b, __constrange(1,8) int c) // VSRA.S8 d0,d0,#8 +{ + int8x8_t shift; + shift = vshr_n_s8(b, c); + return vadd_s8( a, shift); +} + +int16x4_t vsra_n_s16(int16x4_t a, int16x4_t b, __constrange(1,16) int c); // VSRA.S16 d0,d0,#16 +_NEON2SSE_INLINE int16x4_t vsra_n_s16(int16x4_t a, int16x4_t b, __constrange(1,16) int c) // VSRA.S16 d0,d0,#16 +{ + int16x4_t shift; + shift = vshr_n_s16( b, c); + return vadd_s16(a, shift); +} + +int32x2_t vsra_n_s32(int32x2_t a, int32x2_t b, __constrange(1,32) int c); // VSRA.S32 d0,d0,#32 +_NEON2SSE_INLINE int32x2_t vsra_n_s32(int32x2_t a, int32x2_t b, __constrange(1,32) int c) // VSRA.S32 d0,d0,#32 +{ + //may be not optimal compared with the serial execution + int32x2_t shift; + shift = vshr_n_s32(b, c); + return vadd_s32( a, shift); +} + +int64x1_t vsra_n_s64(int64x1_t a, int64x1_t b, __constrange(1,64) int c); // VSRA.S64 d0,d0,#64 +_NEON2SSE_INLINE int64x1_t vsra_n_s64(int64x1_t a, int64x1_t b, __constrange(1,64) int c) +{ + //may be not optimal compared with a serial solution + int64x1_t shift; + shift = vshr_n_s64(b, c); + return vadd_s64( a, shift); +} + +uint8x8_t vsra_n_u8(uint8x8_t a, uint8x8_t b, __constrange(1,8) int c); // VSRA.U8 d0,d0,#8 +_NEON2SSE_INLINE uint8x8_t vsra_n_u8(uint8x8_t a, uint8x8_t b, __constrange(1,8) int c) // VSRA.U8 d0,d0,#8 +{ + uint8x8_t shift; + shift = vshr_n_u8(b, c); + return vadd_u8(a, shift); +} + +uint16x4_t vsra_n_u16(uint16x4_t a, uint16x4_t b, __constrange(1,16) int c); // VSRA.s16 d0,d0,#16 +_NEON2SSE_INLINE uint16x4_t vsra_n_u16(uint16x4_t a, uint16x4_t b, __constrange(1,16) int c) // VSRA.s16 d0,d0,#16 +{ + uint16x4_t shift; + shift = vshr_n_u16(b, c); + return vadd_u16(a,shift); +} + +uint32x2_t vsra_n_u32(uint32x2_t a, uint32x2_t b, __constrange(1,32) int c); // VSRA.U32 d0,d0,#32 +_NEON2SSE_INLINE uint32x2_t vsra_n_u32(uint32x2_t a, uint32x2_t b, __constrange(1,32) int c) // VSRA.U32 d0,d0,#32 +{ + //may be not optimal compared with the serial execution + uint32x2_t shift; + shift = vshr_n_u32(b, c); + return vadd_u32( a, shift); +} + +uint64x1_t vsra_n_u64(uint64x1_t a, uint64x1_t b, __constrange(1,64) int c); // VSRA.U64 d0,d0,#64 +_NEON2SSE_INLINE uint64x1_t vsra_n_u64(uint64x1_t a, uint64x1_t b, __constrange(1,64) int c) // VSRA.U64 d0,d0,#64 +{ + //may be not optimal compared with the serial execution + uint64x1_t shift; + shift = vshr_n_u64(b, c); + return vadd_u64(a, shift); +} + +int8x16_t vsraq_n_s8(int8x16_t a, int8x16_t b, __constrange(1,8) int c); // VSRA.S8 q0,q0,#8 +_NEON2SSE_INLINE int8x16_t vsraq_n_s8(int8x16_t a, int8x16_t b, __constrange(1,8) int c) // VSRA.S8 q0,q0,#8 +{ + int8x16_t shift; + shift = vshrq_n_s8(b, c); + return vaddq_s8(a, shift); +} + +int16x8_t vsraq_n_s16(int16x8_t a, int16x8_t b, __constrange(1,16) int c); // VSRA.S16 q0,q0,#16 +_NEON2SSE_INLINE int16x8_t vsraq_n_s16(int16x8_t a, int16x8_t b, __constrange(1,16) int c) // VSRA.S16 q0,q0,#16 +{ + int16x8_t shift; + shift = vshrq_n_s16(b, c); + return vaddq_s16(a, shift); +} + +int32x4_t vsraq_n_s32(int32x4_t a, int32x4_t b, __constrange(1,32) int c); // VSRA.S32 q0,q0,#32 +_NEON2SSE_INLINE int32x4_t vsraq_n_s32(int32x4_t a, int32x4_t b, __constrange(1,32) int c) // VSRA.S32 q0,q0,#32 +{ + int32x4_t shift; + shift = vshrq_n_s32(b, c); + return vaddq_s32(a, shift); +} + +int64x2_t vsraq_n_s64(int64x2_t a, int64x2_t b, __constrange(1,64) int c); // VSRA.S64 q0,q0,#64 +_NEON2SSE_INLINE int64x2_t vsraq_n_s64(int64x2_t a, int64x2_t b, __constrange(1,64) int c) // VSRA.S64 q0,q0,#64 +{ + int64x2_t shift; + shift = vshrq_n_s64(b, c); + return vaddq_s64( a, shift); +} + +uint8x16_t vsraq_n_u8(uint8x16_t a, uint8x16_t b, __constrange(1,8) int c); // VSRA.U8 q0,q0,#8 +_NEON2SSE_INLINE uint8x16_t vsraq_n_u8(uint8x16_t a, uint8x16_t b, __constrange(1,8) int c) // VSRA.U8 q0,q0,#8 +{ + uint8x16_t shift; + shift = vshrq_n_u8(b, c); + return vaddq_u8(a, shift); +} + +uint16x8_t vsraq_n_u16(uint16x8_t a, uint16x8_t b, __constrange(1,16) int c); // VSRA.s16 q0,q0,#16 +_NEON2SSE_INLINE uint16x8_t vsraq_n_u16(uint16x8_t a, uint16x8_t b, __constrange(1,16) int c) // VSRA.s16 q0,q0,#16 +{ + uint16x8_t shift; + shift = vshrq_n_u16(b, c); + return vaddq_u16(a, shift); +} + +uint32x4_t vsraq_n_u32(uint32x4_t a, uint32x4_t b, __constrange(1,32) int c); // VSRA.U32 q0,q0,#32 +_NEON2SSE_INLINE uint32x4_t vsraq_n_u32(uint32x4_t a, uint32x4_t b, __constrange(1,32) int c) // VSRA.U32 q0,q0,#32 +{ + uint32x4_t shift; + shift = vshrq_n_u32(b, c); + return vaddq_u32(a, shift); +} + +uint64x2_t vsraq_n_u64(uint64x2_t a, uint64x2_t b, __constrange(1,64) int c); // VSRA.U64 q0,q0,#64 +_NEON2SSE_INLINE uint64x2_t vsraq_n_u64(uint64x2_t a, uint64x2_t b, __constrange(1,64) int c) // VSRA.U64 q0,q0,#64 +{ + uint64x2_t shift; + shift = vshrq_n_u64(b, c); + return vaddq_u64(a, shift); +} + +//************* Vector rounding shift right by constant and accumulate **************************** +//************************************************************************************************ +int8x8_t vrsra_n_s8(int8x8_t a, int8x8_t b, __constrange(1,8) int c); // VRSRA.S8 d0,d0,#8 +_NEON2SSE_INLINE int8x8_t vrsra_n_s8(int8x8_t a, int8x8_t b, __constrange(1,8) int c) // VRSRA.S8 d0,d0,#8 +{ + int8x8_t shift; + shift = vrshr_n_s8(b, c); + return vadd_s8( a, shift); +} + +int16x4_t vrsra_n_s16(int16x4_t a, int16x4_t b, __constrange(1,16) int c); // VRSRA.S16 d0,d0,#16 +_NEON2SSE_INLINE int16x4_t vrsra_n_s16(int16x4_t a, int16x4_t b, __constrange(1,16) int c) // VRSRA.S16 d0,d0,#16 +{ + int16x4_t shift; + shift = vrshr_n_s16( b, c); + return vadd_s16(a, shift); +} + +int32x2_t vrsra_n_s32(int32x2_t a, int32x2_t b, __constrange(1,32) int c); // VRSRA.S32 d0,d0,#32 +_NEON2SSE_INLINE int32x2_t vrsra_n_s32(int32x2_t a, int32x2_t b, __constrange(1,32) int c) // VRSRA.S32 d0,d0,#32 +{ + //may be not optimal compared with the serial execution + int32x2_t shift; + shift = vrshr_n_s32(b, c); + return vadd_s32( a, shift); +} + +int64x1_t vrsra_n_s64(int64x1_t a, int64x1_t b, __constrange(1,64) int c); // VRSRA.S64 d0,d0,#64 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x1_t vrsra_n_s64(int64x1_t a, int64x1_t b, __constrange(1,64) int c), _NEON2SSE_REASON_SLOW_SERIAL) //serial solution +{ + int64x1_t shift; + shift = vrshr_n_s64(b, c); + return vadd_s64( a, shift); +} + +uint8x8_t vrsra_n_u8(uint8x8_t a, uint8x8_t b, __constrange(1,8) int c); // VRSRA.U8 d0,d0,#8 +_NEON2SSE_INLINE uint8x8_t vrsra_n_u8(uint8x8_t a, uint8x8_t b, __constrange(1,8) int c) // VRSRA.U8 d0,d0,#8 +{ + uint8x8_t shift; + shift = vrshr_n_u8(b, c); + return vadd_u8(a, shift); +} + +uint16x4_t vrsra_n_u16(uint16x4_t a, uint16x4_t b, __constrange(1,16) int c); // VRSRA.s16 d0,d0,#16 +_NEON2SSE_INLINE uint16x4_t vrsra_n_u16(uint16x4_t a, uint16x4_t b, __constrange(1,16) int c) // VRSRA.s16 d0,d0,#16 +{ + uint16x4_t shift; + shift = vrshr_n_u16(b, c); + return vadd_u16(a,shift); +} + +uint32x2_t vrsra_n_u32(uint32x2_t a, uint32x2_t b, __constrange(1,32) int c); // VRSRA.U32 d0,d0,#32 +_NEON2SSE_INLINE uint32x2_t vrsra_n_u32(uint32x2_t a, uint32x2_t b, __constrange(1,32) int c) // VRSRA.U32 d0,d0,#32 +{ + //may be not optimal compared with the serial execution + uint32x2_t shift; + shift = vrshr_n_u32(b, c); + return vadd_u32( a, shift); +} + +uint64x1_t vrsra_n_u64(uint64x1_t a, uint64x1_t b, __constrange(1,64) int c); // VRSRA.U64 d0,d0,#64 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint64x1_t vrsra_n_u64(uint64x1_t a, uint64x1_t b, __constrange(1,64) int c), _NEON2SSE_REASON_SLOW_SERIAL) //serial solution +{ + //may be not optimal compared with the serial execution + uint64x1_t shift; + shift = vrshr_n_u64(b, c); + return vadd_u64( a, shift); +} + +int8x16_t vrsraq_n_s8(int8x16_t a, int8x16_t b, __constrange(1,8) int c); // VRSRA.S8 q0,q0,#8 +_NEON2SSE_INLINE int8x16_t vrsraq_n_s8(int8x16_t a, int8x16_t b, __constrange(1,8) int c) // VRSRA.S8 q0,q0,#8 +{ + int8x16_t shift; + shift = vrshrq_n_s8(b, c); + return vaddq_s8(a, shift); +} + +int16x8_t vrsraq_n_s16(int16x8_t a, int16x8_t b, __constrange(1,16) int c); // VRSRA.S16 q0,q0,#16 +_NEON2SSE_INLINE int16x8_t vrsraq_n_s16(int16x8_t a, int16x8_t b, __constrange(1,16) int c) // VRSRA.S16 q0,q0,#16 +{ + int16x8_t shift; + shift = vrshrq_n_s16(b, c); + return vaddq_s16(a, shift); +} + +int32x4_t vrsraq_n_s32(int32x4_t a, int32x4_t b, __constrange(1,32) int c); // VRSRA.S32 q0,q0,#32 +_NEON2SSE_INLINE int32x4_t vrsraq_n_s32(int32x4_t a, int32x4_t b, __constrange(1,32) int c) // VRSRA.S32 q0,q0,#32 +{ + int32x4_t shift; + shift = vrshrq_n_s32(b, c); + return vaddq_s32(a, shift); +} + +int64x2_t vrsraq_n_s64(int64x2_t a, int64x2_t b, __constrange(1,64) int c); // VRSRA.S64 q0,q0,#64 +_NEON2SSE_INLINE int64x2_t vrsraq_n_s64(int64x2_t a, int64x2_t b, __constrange(1,64) int c) +{ + int64x2_t shift; + shift = vrshrq_n_s64(b, c); + return vaddq_s64(a, shift); +} + +uint8x16_t vrsraq_n_u8(uint8x16_t a, uint8x16_t b, __constrange(1,8) int c); // VRSRA.U8 q0,q0,#8 +_NEON2SSE_INLINE uint8x16_t vrsraq_n_u8(uint8x16_t a, uint8x16_t b, __constrange(1,8) int c) // VRSRA.U8 q0,q0,#8 +{ + uint8x16_t shift; + shift = vrshrq_n_u8(b, c); + return vaddq_u8(a, shift); +} + +uint16x8_t vrsraq_n_u16(uint16x8_t a, uint16x8_t b, __constrange(1,16) int c); // VRSRA.s16 q0,q0,#16 +_NEON2SSE_INLINE uint16x8_t vrsraq_n_u16(uint16x8_t a, uint16x8_t b, __constrange(1,16) int c) // VRSRA.s16 q0,q0,#16 +{ + uint16x8_t shift; + shift = vrshrq_n_u16(b, c); + return vaddq_u16(a, shift); +} + +uint32x4_t vrsraq_n_u32(uint32x4_t a, uint32x4_t b, __constrange(1,32) int c); // VRSRA.U32 q0,q0,#32 +_NEON2SSE_INLINE uint32x4_t vrsraq_n_u32(uint32x4_t a, uint32x4_t b, __constrange(1,32) int c) // VRSRA.U32 q0,q0,#32 +{ + uint32x4_t shift; + shift = vrshrq_n_u32(b, c); + return vaddq_u32(a, shift); +} + +uint64x2_t vrsraq_n_u64(uint64x2_t a, uint64x2_t b, __constrange(1,64) int c); // VRSRA.U64 q0,q0,#64 +_NEON2SSE_INLINE uint64x2_t vrsraq_n_u64(uint64x2_t a, uint64x2_t b, __constrange(1,64) int c) +{ + uint64x2_t shift; + shift = vrshrq_n_u64(b, c); + return vaddq_u64(a, shift); +} + +//**********************Vector saturating shift left by constant ***************************** +//******************************************************************************************** +//we don't check const ranges assuming they are met +int8x8_t vqshl_n_s8(int8x8_t a, __constrange(0,7) int b); // VQSHL.S8 d0,d0,#0 +_NEON2SSE_INLINE int8x8_t vqshl_n_s8(int8x8_t a, __constrange(0,7) int b) // VQSHL.S8 d0,d0,#0 +{ + //no 8 bit shift available in IA32 SIMD, go to 16 bit. It also provides the auto saturation (in packs function) + int8x8_t res64; + __m128i a128, r128; + a128 = _MM_CVTEPI8_EPI16 (_pM128i(a)); //SSE 4.1 + r128 = _mm_slli_epi16 (a128, b); + r128 = _mm_packs_epi16 (r128,r128); //saturated s8, use 64 low bits only + return64(r128); +} + +int16x4_t vqshl_n_s16(int16x4_t a, __constrange(0,15) int b); // VQSHL.S16 d0,d0,#0 +_NEON2SSE_INLINE int16x4_t vqshl_n_s16(int16x4_t a, __constrange(0,15) int b) // VQSHL.S16 d0,d0,#0 +{ + // go to 32 bit to get the auto saturation (in packs function) + int16x4_t res64; + __m128i a128, r128; + a128 = _MM_CVTEPI16_EPI32 (_pM128i(a)); //SSE 4.1 + r128 = _mm_slli_epi32 (a128, b); //shift_res + r128 = _mm_packs_epi32 (r128,r128); //saturated s16, use 64 low bits only + return64(r128); +} + +int32x2_t vqshl_n_s32(int32x2_t a, __constrange(0,31) int b); // VQSHL.S32 d0,d0,#0 +_NEON2SSE_INLINE int32x2_t vqshl_n_s32(int32x2_t a, __constrange(0,31) int b) +{ + //serial execution may be faster + int32x2_t res64; + return64(vqshlq_n_s32 (_pM128i(a), b)); +} + + +int64x1_t vqshl_n_s64(int64x1_t a, __constrange(0,63) int b); // VQSHL.S64 d0,d0,#0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x1_t vqshl_n_s64(int64x1_t a, __constrange(0,63) int b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + // no effective SIMD solution here + int64x1_t res; + int64_t bmask; + int64_t a_i64 = *( int64_t*)&a; + bmask = ( int64_t)1 << (63 - b); //positive + if (a_i64 >= bmask) { + res.m64_i64[0] = ~(_SIGNBIT64); + } else { + res.m64_i64[0] = (a_i64 <= -bmask) ? _SIGNBIT64 : a_i64 << b; + } + return res; +} + + +uint8x8_t vqshl_n_u8(uint8x8_t a, __constrange(0,7) int b); // VQSHL.U8 d0,d0,#0 +_NEON2SSE_INLINE uint8x8_t vqshl_n_u8(uint8x8_t a, __constrange(0,7) int b) // VQSHL.U8 d0,d0,#0 +{ + //no 8 bit shift available in IA32 SIMD, go to 16 bit + uint8x8_t res64; + __m128i a128, r128; + a128 = _MM_CVTEPU8_EPI16 (_pM128i(a)); //SSE 4.1 + r128 = _mm_slli_epi16 (a128, b); //shift_res + r128 = _mm_packus_epi16 (r128,r128); //saturated u8, use 64 low bits only + return64(r128); +} + +uint16x4_t vqshl_n_u16(uint16x4_t a, __constrange(0,15) int b); // VQSHL.s16 d0,d0,#0 +_NEON2SSE_INLINE uint16x4_t vqshl_n_u16(uint16x4_t a, __constrange(0,15) int b) // VQSHL.s16 d0,d0,#0 +{ + // go to 32 bit to get the auto saturation (in packus function) + uint16x4_t res64; + __m128i a128, r128; + a128 = _MM_CVTEPU16_EPI32 (_pM128i(a)); //SSE 4.1 + r128 = _mm_slli_epi32 (a128, b); //shift_res + r128 = _MM_PACKUS1_EPI32 (r128); //saturated s16 + return64(r128); +} + +uint32x2_t vqshl_n_u32(uint32x2_t a, __constrange(0,31) int b); // VQSHL.U32 d0,d0,#0 +_NEON2SSE_INLINE uint32x2_t vqshl_n_u32(uint32x2_t a, __constrange(0,31) int b) +{ + uint32x2_t res64; + return64(vqshlq_n_u32(_pM128i(a), b)); +} + +uint64x1_t vqshl_n_u64(uint64x1_t a, __constrange(0,63) int b); // VQSHL.U64 d0,d0,#0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint64x1_t vqshl_n_u64(uint64x1_t a, __constrange(0,63) int b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + // no effective SIMD solution here + uint64x1_t res; + uint64_t bmask; + uint64_t a_i64 = *(uint64_t*)&a; + bmask = ( uint64_t)1 << (64 - b); + res.m64_u64[0] = (a_i64 >= bmask)&&(b>0) ? 0xffffffffffffffff : a_i64 << b; //if b=0 we are fine with any a + return res; +} + +int8x16_t vqshlq_n_s8(int8x16_t a, __constrange(0,7) int b); // VQSHL.S8 q0,q0,#0 +_NEON2SSE_INLINE int8x16_t vqshlq_n_s8(int8x16_t a, __constrange(0,7) int b) // VQSHL.S8 q0,q0,#0 +{ + // go to 16 bit to get the auto saturation (in packs function) + __m128i a128, r128_1, r128_2; + a128 = _MM_CVTEPI8_EPI16 (a); //SSE 4.1 + r128_1 = _mm_slli_epi16 (a128, b); + //swap hi and low part of a128 to process the remaining data + a128 = _mm_shuffle_epi32 (a, _SWAP_HI_LOW32); + a128 = _MM_CVTEPI8_EPI16 (a128); + r128_2 = _mm_slli_epi16 (a128, b); + return _mm_packs_epi16 (r128_1, r128_2); //saturated s8 +} + +int16x8_t vqshlq_n_s16(int16x8_t a, __constrange(0,15) int b); // VQSHL.S16 q0,q0,#0 +_NEON2SSE_INLINE int16x8_t vqshlq_n_s16(int16x8_t a, __constrange(0,15) int b) // VQSHL.S16 q0,q0,#0 +{ + // manual saturation solution looks LESS optimal than 32 bits conversion one + // go to 32 bit to get the auto saturation (in packs function) + __m128i a128, r128_1, r128_2; + a128 = _MM_CVTEPI16_EPI32 (a); //SSE 4.1 + r128_1 = _mm_slli_epi32 (a128, b); //shift_res + //swap hi and low part of a128 to process the remaining data + a128 = _mm_shuffle_epi32 (a, _SWAP_HI_LOW32); + a128 = _MM_CVTEPI16_EPI32 (a128); + r128_2 = _mm_slli_epi32 (a128, b); + return _mm_packs_epi32 (r128_1, r128_2); //saturated s16 +} + +int32x4_t vqshlq_n_s32(int32x4_t a, __constrange(0,31) int b); // VQSHL.S32 q0,q0,#0 +_NEON2SSE_INLINE int32x4_t vqshlq_n_s32(int32x4_t a, __constrange(0,31) int b) // VQSHL.S32 q0,q0,#0 +{ + // no 64 bit saturation option available, special tricks necessary + __m128i c1, maskA, saturation_mask, c7ffffff_mask, shift_res, shift_res_mask; + c1 = _mm_cmpeq_epi32(a,a); //0xff..ff + maskA = _mm_srli_epi32(c1, b + 1); //mask for positive numbers (32-b+1) zeros and b-1 ones + saturation_mask = _mm_cmpgt_epi32 (a, maskA); //0xff...ff if we need saturation, 0 otherwise + c7ffffff_mask = _mm_srli_epi32(saturation_mask, 1); //saturated to 0x7f..ff when needed and zeros if not + shift_res = _mm_slli_epi32 (a, b); + shift_res_mask = _mm_andnot_si128(saturation_mask, shift_res); + //result with positive numbers saturated + shift_res = _mm_or_si128 (c7ffffff_mask, shift_res_mask); + //treat negative numbers + maskA = _mm_slli_epi32(c1, 31 - b); //mask for negative numbers b-1 ones and (32-b+1) zeros + saturation_mask = _mm_cmpgt_epi32 (maskA,a); //0xff...ff if we need saturation, 0 otherwise + c7ffffff_mask = _mm_slli_epi32(saturation_mask, 31); //saturated to 0x80..00 when needed and zeros if not + shift_res_mask = _mm_andnot_si128(saturation_mask, shift_res); + return _mm_or_si128 (c7ffffff_mask, shift_res_mask); +} + +int64x2_t vqshlq_n_s64(int64x2_t a, __constrange(0,63) int b); // VQSHL.S64 q0,q0,#0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x2_t vqshlq_n_s64(int64x2_t a, __constrange(0,63) int b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + // no effective SIMD solution here + _NEON2SSE_ALIGN_16 int64_t atmp[2], res[2]; + int64_t bmask; + int i; + bmask = ( int64_t)1 << (63 - b); //positive + _mm_store_si128((__m128i*)atmp, a); + for (i = 0; i<2; i++) { + if (atmp[i] >= bmask) { + res[i] = ~(_SIGNBIT64); + } else { + res[i] = (atmp[i] <= -bmask) ? _SIGNBIT64 : atmp[i] << b; + } + } + return _mm_load_si128((__m128i*)res); +} + +uint8x16_t vqshlq_n_u8(uint8x16_t a, __constrange(0,7) int b); // VQSHL.U8 q0,q0,#0 +_NEON2SSE_INLINE uint8x16_t vqshlq_n_u8(uint8x16_t a, __constrange(0,7) int b) // VQSHL.U8 q0,q0,#0 +{ + // go to 16 bit to get the auto saturation (in packs function) + __m128i a128, r128_1, r128_2; + a128 = _MM_CVTEPU8_EPI16 (a); //SSE 4.1 + r128_1 = _mm_slli_epi16 (a128, b); + //swap hi and low part of a128 to process the remaining data + a128 = _mm_shuffle_epi32 (a, _SWAP_HI_LOW32); + a128 = _MM_CVTEPU8_EPI16 (a128); + r128_2 = _mm_slli_epi16 (a128, b); + return _mm_packus_epi16 (r128_1, r128_2); //saturated u8 +} + +uint16x8_t vqshlq_n_u16(uint16x8_t a, __constrange(0,15) int b); // VQSHL.s16 q0,q0,#0 +_NEON2SSE_INLINE uint16x8_t vqshlq_n_u16(uint16x8_t a, __constrange(0,15) int b) // VQSHL.s16 q0,q0,#0 +{ + // manual saturation solution looks more optimal than 32 bits conversion one + __m128i cb, c8000, a_signed, saturation_mask, shift_res; + cb = _mm_set1_epi16((1 << (16 - b)) - 1 - 0x8000 ); + c8000 = _mm_set1_epi16 (0x8000); +//no unsigned shorts comparison in SSE, only signed available, so need the trick + a_signed = _mm_sub_epi16(a, c8000); //go to signed + saturation_mask = _mm_cmpgt_epi16 (a_signed, cb); + shift_res = _mm_slli_epi16 (a, b); + return _mm_or_si128 (shift_res, saturation_mask); +} + +uint32x4_t vqshlq_n_u32(uint32x4_t a, __constrange(0,31) int b); // VQSHL.U32 q0,q0,#0 +_NEON2SSE_INLINE uint32x4_t vqshlq_n_u32(uint32x4_t a, __constrange(0,31) int b) // VQSHL.U32 q0,q0,#0 +{ + // manual saturation solution, no 64 bit saturation option, the serial version may be faster + __m128i cb, c80000000, a_signed, saturation_mask, shift_res; + cb = _mm_set1_epi32((1 << (32 - b)) - 1 - 0x80000000 ); + c80000000 = _mm_set1_epi32 (0x80000000); +//no unsigned ints comparison in SSE, only signed available, so need the trick + a_signed = _mm_sub_epi32(a, c80000000); //go to signed + saturation_mask = _mm_cmpgt_epi32 (a_signed, cb); + shift_res = _mm_slli_epi32 (a, b); + return _mm_or_si128 (shift_res, saturation_mask); +} + +uint64x2_t vqshlq_n_u64(uint64x2_t a, __constrange(0,63) int b); // VQSHL.U64 q0,q0,#0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint64x2_t vqshlq_n_u64(uint64x2_t a, __constrange(0,63) int b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + // no effective SIMD solution here + _NEON2SSE_ALIGN_16 uint64_t atmp[2], res[2]; + uint64_t bmask; + int i; + bmask = ( uint64_t)1 << (64 - b); + _mm_store_si128((__m128i*)atmp, a); + for (i = 0; i<2; i++) { + res[i] = (atmp[i] >= bmask)&&(b>0) ? 0xffffffffffffffff : atmp[i] << b; //if b=0 we are fine with any a + } + return _mm_load_si128((__m128i*)res); +} + +//**************Vector signed->unsigned saturating shift left by constant ************* +//************************************************************************************* +uint8x8_t vqshlu_n_s8(int8x8_t a, __constrange(0,7) int b); // VQSHLU.S8 d0,d0,#0 +_NEON2SSE_INLINE uint8x8_t vqshlu_n_s8(int8x8_t a, __constrange(0,7) int b) // VQSHLU.S8 d0,d0,#0 +{ + //no 8 bit shift available in IA32 SIMD, go to 16 bit. It also provides the auto saturation (in packs function) + uint8x8_t res64; + __m128i a128, r128; + a128 = _MM_CVTEPI8_EPI16 (_pM128i(a)); //SSE 4.1 + r128 = _mm_slli_epi16 (a128, b); + r128 = _mm_packus_epi16 (r128,r128); //saturated u8, use 64 low bits only + return64(r128); +} + +uint16x4_t vqshlu_n_s16(int16x4_t a, __constrange(0,15) int b); // VQSHLU.S16 d0,d0,#0 +_NEON2SSE_INLINE uint16x4_t vqshlu_n_s16(int16x4_t a, __constrange(0,15) int b) // VQSHLU.S16 d0,d0,#0 +{ + uint16x4_t res64; + __m128i a128, r128; + a128 = _MM_CVTEPI16_EPI32 (_pM128i(a)); //SSE 4.1 + r128 = _mm_slli_epi32 (a128, b); //shift_res + r128 = _MM_PACKUS1_EPI32 (r128); //saturated s16, use 64 low bits only + return64(r128); +} + +uint32x2_t vqshlu_n_s32(int32x2_t a, __constrange(0,31) int b); // VQSHLU.S32 d0,d0,#0 +_NEON2SSE_INLINE int32x2_t vqshlu_n_s32(int32x2_t a, __constrange(0,31) int b) +{ + int32x2_t res64; + return64( vqshluq_n_s32(_pM128i(a), b)); +} + +uint64x1_t vqshlu_n_s64(int64x1_t a, __constrange(0,63) int b); // VQSHLU.S64 d0,d0,#0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint64x1_t vqshlu_n_s64(int64x1_t a, __constrange(0,63) int b), _NEON2SSE_REASON_SLOW_SERIAL) // no effective SIMD solution here, serial execution looks faster +{ + uint64x1_t res; + uint64_t limit; + if (a.m64_i64[0]<=0) { + res.m64_u64[0] = 0; + } else { + limit = (uint64_t) 1 << (64 - b); + res.m64_u64[0] = ( ((uint64_t)a.m64_i64[0]) >= limit) ? res.m64_u64[0] = ~((uint64_t)0) : a.m64_i64[0] << b; + } + return res; +} + +uint8x16_t vqshluq_n_s8(int8x16_t a, __constrange(0,7) int b); // VQSHLU.S8 q0,q0,#0 +_NEON2SSE_INLINE uint8x16_t vqshluq_n_s8(int8x16_t a, __constrange(0,7) int b) // VQSHLU.S8 q0,q0,#0 +{ + __m128i a128, r128_1, r128_2; + a128 = _MM_CVTEPI8_EPI16 (a); //SSE 4.1 + r128_1 = _mm_slli_epi16 (a128, b); + //swap hi and low part of a128 to process the remaining data + a128 = _mm_shuffle_epi32 (a, _SWAP_HI_LOW32); + a128 = _MM_CVTEPI8_EPI16 (a128); + r128_2 = _mm_slli_epi16 (a128, b); + return _mm_packus_epi16 (r128_1, r128_2); //saturated u8 +} + +uint16x8_t vqshluq_n_s16(int16x8_t a, __constrange(0,15) int b); // VQSHLU.S16 q0,q0,#0 +_NEON2SSE_INLINE uint16x8_t vqshluq_n_s16(int16x8_t a, __constrange(0,15) int b) // VQSHLU.S16 q0,q0,#0 +{ + // manual saturation solution looks LESS optimal than 32 bits conversion one + __m128i a128, r128_1, r128_2; + a128 = _MM_CVTEPI16_EPI32 (a); //SSE 4.1 + r128_1 = _mm_slli_epi32 (a128, b); //shift_res + //swap hi and low part of a128 to process the remaining data + a128 = _mm_shuffle_epi32 (a, _SWAP_HI_LOW32); + a128 = _MM_CVTEPI16_EPI32 (a128); + r128_2 = _mm_slli_epi32 (a128, b); + return _MM_PACKUS_EPI32 (r128_1, r128_2); //saturated s16 +} + +uint32x4_t vqshluq_n_s32(int32x4_t a, __constrange(0,31) int b); // VQSHLU.S32 q0,q0,#0 +_NEON2SSE_INLINE uint32x4_t vqshluq_n_s32(int32x4_t a, __constrange(0,31) int b) // VQSHLU.S32 q0,q0,#0 +{ + //solution may be not optimal compared with the serial one + __m128i zero, maskA, maskGT0, a0, a_masked, a_shift; + zero = _mm_setzero_si128(); + maskA = _mm_cmpeq_epi32(a, a); + maskA = _mm_slli_epi32(maskA,(32 - b)); // b ones and (32-b)zeros + //saturate negative numbers to zero + maskGT0 = _mm_cmpgt_epi32 (a, zero); // //0xffffffff if positive number and zero otherwise (negative numbers) + a0 = _mm_and_si128 (a, maskGT0); //negative are zeros now + //saturate positive to 0xffffffff + a_masked = _mm_and_si128 (a0, maskA); + a_masked = _mm_cmpgt_epi32 (a_masked, zero); //0xffffffff if saturation necessary 0 otherwise + a_shift = _mm_slli_epi32 (a0, b); + return _mm_or_si128 (a_shift, a_masked); //actual saturation +} + +uint64x2_t vqshluq_n_s64(int64x2_t a, __constrange(0,63) int b); // VQSHLU.S64 q0,q0,#0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint64x2_t vqshluq_n_s64(int64x2_t a, __constrange(0,63) int b), _NEON2SSE_REASON_SLOW_SERIAL) +{ + // no effective SIMD solution here, serial execution looks faster + _NEON2SSE_ALIGN_16 int64_t atmp[2]; + _NEON2SSE_ALIGN_16 uint64_t res[2]; + uint64_t limit; + int i; + _mm_store_si128((__m128i*)atmp, a); + for (i = 0; i<2; i++) { + if (atmp[i]<=0) { + res[i] = 0; + } else { + limit = (uint64_t) 1 << (64 - b); + res[i] = ( ((uint64_t)atmp[i]) >= limit) ? res[i] = ~((uint64_t)0) : atmp[i] << b; + } + } + return _mm_load_si128((__m128i*)res); +} + +//************** Vector narrowing shift right by constant ************** +//********************************************************************** +int8x8_t vshrn_n_s16(int16x8_t a, __constrange(1,8) int b); // VSHRN.I16 d0,q0,#8 +_NEON2SSE_INLINE int8x8_t vshrn_n_s16(int16x8_t a, __constrange(1,8) int b) // VSHRN.I16 d0,q0,#8 +{ + int8x8_t res64; + __m128i r16; + _NEON2SSE_ALIGN_16 int8_t mask8_16_even_odd[16] = { 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15 }; + r16 = vshrq_n_s16(a,b); + r16 = _mm_shuffle_epi8 (r16, *(__m128i*) mask8_16_even_odd); //narrow, use low 64 bits only. Impossible to use _mm_packs because of negative saturation problems + return64(r16); +} + +int16x4_t vshrn_n_s32(int32x4_t a, __constrange(1,16) int b); // VSHRN.I32 d0,q0,#16 +_NEON2SSE_INLINE int16x4_t vshrn_n_s32(int32x4_t a, __constrange(1,16) int b) // VSHRN.I32 d0,q0,#16 +{ + int16x4_t res64; + __m128i r32; + _NEON2SSE_ALIGN_16 int8_t mask16_odd[16] = { 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15 }; + r32 = vshrq_n_s32(a,b); + r32 = _mm_shuffle_epi8 (r32, *(__m128i*) mask16_odd); //narrow, use low 64 bits only. Impossible to use _mm_packs because of negative saturation problems + return64(r32); +} + +int32x2_t vshrn_n_s64(int64x2_t a, __constrange(1,32) int b); // VSHRN.I64 d0,q0,#32 +_NEON2SSE_INLINE int32x2_t vshrn_n_s64(int64x2_t a, __constrange(1,32) int b) +{ + int32x2_t res64; + __m128i r64; + r64 = vshrq_n_s64(a,b); + r64 = _mm_shuffle_epi32(r64, 0 | (2 << 2) | (1 << 4) | (3 << 6)); //shuffle the data to get 2 32-bits + return64(r64); +} + +uint8x8_t vshrn_n_u16(uint16x8_t a, __constrange(1,8) int b); // VSHRN.I16 d0,q0,#8 +_NEON2SSE_INLINE uint8x8_t vshrn_n_u16(uint16x8_t a, __constrange(1,8) int b) // VSHRN.I16 d0,q0,#8 +{ + uint8x8_t res64; + __m128i mask, r16; + mask = _mm_set1_epi16(0xff); + r16 = vshrq_n_s16(a,b); //after right shift b>=1 unsigned var fits into signed range, so we could use _mm_packus_epi16 (signed 16 to unsigned 8) + r16 = _mm_and_si128(r16, mask); //to avoid saturation + r16 = _mm_packus_epi16 (r16,r16); //narrow, use low 64 bits only + return64(r16); +} + +uint16x4_t vshrn_n_u32(uint32x4_t a, __constrange(1,16) int b); // VSHRN.I32 d0,q0,#16 +_NEON2SSE_INLINE uint16x4_t vshrn_n_u32(uint32x4_t a, __constrange(1,16) int b) // VSHRN.I32 d0,q0,#16 +{ + uint16x4_t res64; + __m128i mask, r32; + mask = _mm_set1_epi32(0xffff); + r32 = vshrq_n_u32(a,b); //after right shift b>=1 unsigned var fits into signed range, so we could use _MM_PACKUS_EPI32 (signed 32 to unsigned 16) + r32 = _mm_and_si128(r32, mask); //to avoid saturation + r32 = _MM_PACKUS1_EPI32 (r32); //saturate and narrow, use low 64 bits only + return64(r32); +} + +uint32x2_t vshrn_n_u64(uint64x2_t a, __constrange(1,32) int b); // VSHRN.I64 d0,q0,#32 +_NEON2SSE_INLINE uint32x2_t vshrn_n_u64(uint64x2_t a, __constrange(1,32) int b) +{ + uint32x2_t res64; + __m128i r64; + r64 = vshrq_n_u64(a,b); + r64 = _mm_shuffle_epi32(r64, 0 | (2 << 2) | (1 << 4) | (3 << 6)); //shuffle the data to get 2 32-bits + return64(r64); +} + +//************** Vector signed->unsigned narrowing saturating shift right by constant ******** +//********************************************************************************************* +uint8x8_t vqshrun_n_s16(int16x8_t a, __constrange(1,8) int b); // VQSHRUN.S16 d0,q0,#8 +_NEON2SSE_INLINE uint8x8_t vqshrun_n_s16(int16x8_t a, __constrange(1,8) int b) // VQSHRUN.S16 d0,q0,#8 +{ + uint8x8_t res64; + __m128i r16; + r16 = vshrq_n_s16(a,b); + r16 = _mm_packus_epi16 (r16,r16); //saturate and narrow (signed to unsigned), use low 64 bits only + return64(r16); +} + +uint16x4_t vqshrun_n_s32(int32x4_t a, __constrange(1,16) int b); // VQSHRUN.S32 d0,q0,#16 +_NEON2SSE_INLINE uint16x4_t vqshrun_n_s32(int32x4_t a, __constrange(1,16) int b) // VQSHRUN.S32 d0,q0,#16 +{ + uint16x4_t res64; + __m128i r32; + r32 = vshrq_n_s32(a,b); + r32 = _MM_PACKUS1_EPI32 (r32); //saturate and narrow(signed to unsigned), use low 64 bits only + return64(r32); +} + +uint32x2_t vqshrun_n_s64(int64x2_t a, __constrange(1,32) int b); // VQSHRUN.S64 d0,q0,#32 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x2_t vqshrun_n_s64(int64x2_t a, __constrange(1,32) int b), _NEON2SSE_REASON_SLOW_SERIAL) //serial solution is faster +{ + _NEON2SSE_ALIGN_16 int64_t atmp[2]; + uint32x2_t res; + int64_t res64; + _mm_store_si128((__m128i*)atmp, a); + if (atmp[0] < 0) { + res.m64_u32[0] = 0; + } else { + res64 = (atmp[0] >> b); + res.m64_u32[0] = (res64 > (int64_t)0xffffffff) ? 0xffffffff : (uint32_t) res64; + } + if (atmp[1] < 0) { + res.m64_u32[1] = 0; + } else { + res64 = (atmp[1] >> b); + res.m64_u32[1] = (res64 > (int64_t)0xffffffff) ? 0xffffffff : (uint32_t)res64; + } + return res; +} + +//**** Vector signed->unsigned rounding narrowing saturating shift right by constant ***** +uint8x8_t vqrshrun_n_s16(int16x8_t a, __constrange(1,8) int b); // VQRSHRUN.S16 d0,q0,#8 +_NEON2SSE_INLINE uint8x8_t vqrshrun_n_s16(int16x8_t a, __constrange(1,8) int b) // VQRSHRUN.S16 d0,q0,#8 +{ + //solution may be not optimal compared with the serial one + __m128i r16; + uint8x8_t res64; + r16 = vrshrq_n_s16(a,b); + r16 = _mm_packus_epi16 (r16,r16); //saturate and narrow (signed to unsigned), use low 64 bits only + return64(r16); +} + +uint16x4_t vqrshrun_n_s32(int32x4_t a, __constrange(1,16) int b); // VQRSHRUN.S32 d0,q0,#16 +_NEON2SSE_INLINE uint16x4_t vqrshrun_n_s32(int32x4_t a, __constrange(1,16) int b) // VQRSHRUN.S32 d0,q0,#16 +{ + //solution may be not optimal compared with the serial one + __m128i r32; + uint16x4_t res64; + r32 = vrshrq_n_s32(a,b); + r32 = _MM_PACKUS1_EPI32 (r32); //saturate and narrow (signed to unsigned), use low 64 bits only + return64(r32); +} + +uint32x2_t vqrshrun_n_s64(int64x2_t a, __constrange(1,32) int b); // VQRSHRUN.S64 d0,q0,#32 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x2_t vqrshrun_n_s64(int64x2_t a, __constrange(1,32) int b), _NEON2SSE_REASON_SLOW_SERIAL) //serial solution is faster +{ + _NEON2SSE_ALIGN_16 int64_t atmp[2]; + uint32x2_t res; + int64_t res64; + _mm_store_si128((__m128i*)atmp, a); + if (atmp[0] < 0) { + res.m64_u32[0] = 0; + } else { + res64 = (atmp[0] >> b) + ( (atmp[0] & ((int64_t)1 << (b - 1))) >> (b - 1) ); + res.m64_u32[0] = (res64 > (int64_t)0xffffffff ) ? 0xffffffff : res64; + } + if (atmp[1] < 0) { + res.m64_u32[1] = 0; + } else { + res64 = (atmp[1] >> b) + ( (atmp[0] & ((int64_t)1 << (b - 1))) >> (b - 1) ); + res.m64_u32[1] = (res64 > (int64_t)0xffffffff ) ? 0xffffffff : res64; + } + return res; +} + +//***** Vector narrowing saturating shift right by constant ****** +//***************************************************************** +int8x8_t vqshrn_n_s16(int16x8_t a, __constrange(1,8) int b); // VQSHRN.S16 d0,q0,#8 +_NEON2SSE_INLINE int8x8_t vqshrn_n_s16(int16x8_t a, __constrange(1,8) int b) // VQSHRN.S16 d0,q0,#8 +{ + int8x8_t res64; + __m128i r16; + r16 = vshrq_n_s16(a,b); + r16 = _mm_packs_epi16 (r16,r16); //saturate and narrow, use low 64 bits only + return64(r16); +} + +int16x4_t vqshrn_n_s32(int32x4_t a, __constrange(1,16) int b); // VQSHRN.S32 d0,q0,#16 +_NEON2SSE_INLINE int16x4_t vqshrn_n_s32(int32x4_t a, __constrange(1,16) int b) // VQSHRN.S32 d0,q0,#16 +{ + int16x4_t res64; + __m128i r32; + r32 = vshrq_n_s32(a,b); + r32 = _mm_packs_epi32 (r32,r32); //saturate and narrow, use low 64 bits only + return64(r32); +} + +int32x2_t vqshrn_n_s64(int64x2_t a, __constrange(1,32) int b); // VQSHRN.S64 d0,q0,#32 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x2_t vqshrn_n_s64(int64x2_t a, __constrange(1,32) int b), _NEON2SSE_REASON_SLOW_UNEFFECTIVE) +{ + //no optimal SIMD solution found + _NEON2SSE_ALIGN_16 int64_t res64[2], atmp[2]; + int32x2_t res; + _mm_store_si128((__m128i*)atmp, a); + res64[0] = (atmp[0] >> b); + res64[1] = (atmp[1] >> b); + if(res64[0]>SINT_MAX) res64[0] = SINT_MAX; + if(res64[0]SINT_MAX) res64[1] = SINT_MAX; + if(res64[1]=1 unsigned var fits into signed range, so we could use _mm_packus_epi16 (signed 16 to unsigned 8) + r16 = _mm_packus_epi16 (r16,r16); //saturate and narrow, use low 64 bits only + return64(r16); +} + +uint16x4_t vqshrn_n_u32(uint32x4_t a, __constrange(1,16) int b); // VQSHRN.U32 d0,q0,#16 +_NEON2SSE_INLINE uint16x4_t vqshrn_n_u32(uint32x4_t a, __constrange(1,16) int b) // VQSHRN.U32 d0,q0,#16 +{ + uint16x4_t res64; + __m128i r32; + r32 = vshrq_n_u32(a,b); //after right shift b>=1 unsigned var fits into signed range, so we could use _MM_PACKUS_EPI32 (signed 32 to unsigned 8) + r32 = _MM_PACKUS1_EPI32 (r32); //saturate and narrow, use low 64 bits only + return64(r32); +} + +uint32x2_t vqshrn_n_u64(uint64x2_t a, __constrange(1,32) int b); // VQSHRN.U64 d0,q0,#32 +_NEON2SSE_INLINE uint32x2_t vqshrn_n_u64(uint64x2_t a, __constrange(1,32) int b) +{ + //serial solution may be faster + uint32x2_t res64; + __m128i r64, res_hi, zero; + zero = _mm_setzero_si128(); + r64 = vshrq_n_u64(a,b); + res_hi = _mm_srli_epi64(r64, 32); + res_hi = _mm_cmpgt_epi32(res_hi, zero); + r64 = _mm_or_si128(r64, res_hi); + r64 = _mm_shuffle_epi32(r64, 0 | (2 << 2) | (1 << 4) | (3 << 6)); //shuffle the data to get 2 32-bits + return64(r64); +} + + +//********* Vector rounding narrowing shift right by constant ************************* +//**************************************************************************************** +int8x8_t vrshrn_n_s16(int16x8_t a, __constrange(1,8) int b); // VRSHRN.I16 d0,q0,#8 +_NEON2SSE_INLINE int8x8_t vrshrn_n_s16(int16x8_t a, __constrange(1,8) int b) // VRSHRN.I16 d0,q0,#8 +{ + int8x8_t res64; + __m128i r16; + _NEON2SSE_ALIGN_16 int8_t mask8_16_even_odd[16] = { 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15 }; + r16 = vrshrq_n_s16(a,b); + r16 = _mm_shuffle_epi8 (r16, *(__m128i*) mask8_16_even_odd); //narrow, use low 64 bits only. Impossible to use _mm_packs because of negative saturation problems + return64(r16); +} + +int16x4_t vrshrn_n_s32(int32x4_t a, __constrange(1,16) int b); // VRSHRN.I32 d0,q0,#16 +_NEON2SSE_INLINE int16x4_t vrshrn_n_s32(int32x4_t a, __constrange(1,16) int b) // VRSHRN.I32 d0,q0,#16 +{ + int16x4_t res64; + __m128i r32; + _NEON2SSE_ALIGN_16 int8_t mask16_odd[16] = { 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15 }; + r32 = vrshrq_n_s32(a,b); + r32 = _mm_shuffle_epi8 (r32, *(__m128i*) mask16_odd); //narrow, use low 64 bits only. Impossible to use _mm_packs because of negative saturation problems + return64(r32); +} + +int32x2_t vrshrn_n_s64(int64x2_t a, __constrange(1,32) int b); // VRSHRN.I64 d0,q0,#32 +_NEON2SSE_INLINE int32x2_t vrshrn_n_s64(int64x2_t a, __constrange(1,32) int b) +{ + int32x2_t res64; + __m128i r64; + r64 = vrshrq_n_s64(a,b); + r64 = _mm_shuffle_epi32(r64, 0 | (2 << 2) | (1 << 4) | (3 << 6)); //shuffle the data to get 2 32-bits + return64(r64); +} + +uint8x8_t vrshrn_n_u16(uint16x8_t a, __constrange(1,8) int b); // VRSHRN.I16 d0,q0,#8 +_NEON2SSE_INLINE uint8x8_t vrshrn_n_u16(uint16x8_t a, __constrange(1,8) int b) // VRSHRN.I16 d0,q0,#8 +{ + uint8x8_t res64; + __m128i mask, r16; + mask = _mm_set1_epi16(0xff); + r16 = vrshrq_n_s16(a,b); //after right shift b>=1 unsigned var fits into signed range, so we could use _mm_packus_epi16 (signed 16 to unsigned 8) + r16 = _mm_and_si128(r16, mask); //to avoid saturation + r16 = _mm_packus_epi16 (r16,r16); //saturate and narrow, use low 64 bits only + return64(r16); +} + +uint16x4_t vrshrn_n_u32(uint32x4_t a, __constrange(1,16) int b); // VRSHRN.I32 d0,q0,#16 +_NEON2SSE_INLINE uint16x4_t vrshrn_n_u32(uint32x4_t a, __constrange(1,16) int b) // VRSHRN.I32 d0,q0,#16 +{ + uint16x4_t res64; + __m128i mask, r32; + mask = _mm_set1_epi32(0xffff); + r32 = vrshrq_n_u32(a,b); //after right shift b>=1 unsigned var fits into signed range, so we could use _MM_PACKUS_EPI32 (signed 32 to unsigned 8) + r32 = _mm_and_si128(r32, mask); //to avoid saturation + r32 = _MM_PACKUS1_EPI32 (r32); //saturate and narrow, use low 64 bits only + return64(r32); +} + +uint32x2_t vrshrn_n_u64(uint64x2_t a, __constrange(1,32) int b); // VRSHRN.I64 d0,q0,#32 +_NEON2SSE_INLINE uint32x2_t vrshrn_n_u64(uint64x2_t a, __constrange(1,32) int b) //serial solution may be faster +{ + uint32x2_t res64; + __m128i r64; + r64 = vrshrq_n_u64(a,b); + r64 = _mm_shuffle_epi32(r64, 0 | (2 << 2) | (1 << 4) | (3 << 6)); //shuffle the data to get 2 32-bits + return64(r64); +} + +//************* Vector rounding narrowing saturating shift right by constant ************ +//**************************************************************************************** +int8x8_t vqrshrn_n_s16(int16x8_t a, __constrange(1,8) int b); // VQRSHRN.S16 d0,q0,#8 +_NEON2SSE_INLINE int8x8_t vqrshrn_n_s16(int16x8_t a, __constrange(1,8) int b) // VQRSHRN.S16 d0,q0,#8 +{ + int8x8_t res64; + __m128i r16; + r16 = vrshrq_n_s16(a,b); + r16 = _mm_packs_epi16 (r16,r16); //saturate and narrow, use low 64 bits only + return64(r16); +} + +int16x4_t vqrshrn_n_s32(int32x4_t a, __constrange(1,16) int b); // VQRSHRN.S32 d0,q0,#16 +_NEON2SSE_INLINE int16x4_t vqrshrn_n_s32(int32x4_t a, __constrange(1,16) int b) // VQRSHRN.S32 d0,q0,#16 +{ + int16x4_t res64; + __m128i r32; + r32 = vrshrq_n_s32(a,b); + r32 = _mm_packs_epi32 (r32,r32); //saturate and narrow, use low 64 bits only + return64(r32); +} + +int32x2_t vqrshrn_n_s64(int64x2_t a, __constrange(1,32) int b); // VQRSHRN.S64 d0,q0,#32 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x2_t vqrshrn_n_s64(int64x2_t a, __constrange(1,32) int b), _NEON2SSE_REASON_SLOW_UNEFFECTIVE) +{ + //no optimal SIMD solution found + _NEON2SSE_ALIGN_16 int64_t res64[2], atmp[2], maskb[2]; + int32x2_t res; + _mm_store_si128((__m128i*)atmp, a); + maskb[0] = atmp[0] & (( int64_t)1 << (b - 1)); + res64[0] = (atmp[0] >> b) + (maskb[0] >> (b - 1)); //rounded result + maskb[1] = atmp[1] & (( int64_t)1 << (b - 1)); + res64[1] = (atmp[1] >> b) + (maskb[1] >> (b - 1)); //rounded result + if(res64[0]>SINT_MAX) res64[0] = SINT_MAX; + if(res64[0]SINT_MAX) res64[1] = SINT_MAX; + if(res64[1]=1 unsigned var fits into signed range, so we could use _mm_packus_epi16 (signed 16 to unsigned 8) + r16 = _mm_packus_epi16 (r16,r16); //saturate and narrow, use low 64 bits only + return64(r16); +} + +uint16x4_t vqrshrn_n_u32(uint32x4_t a, __constrange(1,16) int b); // VQRSHRN.U32 d0,q0,#16 +_NEON2SSE_INLINE uint16x4_t vqrshrn_n_u32(uint32x4_t a, __constrange(1,16) int b) // VQRSHRN.U32 d0,q0,#16 +{ + uint16x4_t res64; + __m128i r32; + r32 = vrshrq_n_u32(a,b); //after right shift b>=1 unsigned var fits into signed range, so we could use _MM_PACKUS_EPI32 (signed 32 to unsigned 8) + r32 = _MM_PACKUS1_EPI32 (r32); //saturate and narrow, use low 64 bits only + return64(r32); +} + +uint32x2_t vqrshrn_n_u64(uint64x2_t a, __constrange(1,32) int b); // VQRSHRN.U64 d0,q0,#32 +_NEON2SSE_INLINE uint32x2_t vqrshrn_n_u64(uint64x2_t a, __constrange(1,32) int b) +{ + //serial solution may be faster + uint32x2_t res64; + __m128i r64, res_hi, zero; + zero = _mm_setzero_si128(); + r64 = vrshrq_n_u64(a,b); + res_hi = _mm_srli_epi64(r64, 32); + res_hi = _mm_cmpgt_epi32(res_hi, zero); + r64 = _mm_or_si128(r64, res_hi); + r64 = _mm_shuffle_epi32(r64, 0 | (2 << 2) | (1 << 4) | (3 << 6)); //shuffle the data to get 2 32-bits + return64(r64); +} + +//************** Vector widening shift left by constant **************** +//************************************************************************ +int16x8_t vshll_n_s8(int8x8_t a, __constrange(0,8) int b); // VSHLL.S8 q0,d0,#0 +_NEON2SSE_INLINE int16x8_t vshll_n_s8(int8x8_t a, __constrange(0,8) int b) // VSHLL.S8 q0,d0,#0 +{ + __m128i r; + r = _MM_CVTEPI8_EPI16 (_pM128i(a)); //SSE 4.1 + return _mm_slli_epi16 (r, b); +} + +int32x4_t vshll_n_s16(int16x4_t a, __constrange(0,16) int b); // VSHLL.S16 q0,d0,#0 +_NEON2SSE_INLINE int32x4_t vshll_n_s16(int16x4_t a, __constrange(0,16) int b) // VSHLL.S16 q0,d0,#0 +{ + __m128i r; + r = _MM_CVTEPI16_EPI32(_pM128i(a)); //SSE4.1, + return _mm_slli_epi32 (r, b); +} + +int64x2_t vshll_n_s32(int32x2_t a, __constrange(0,32) int b); // VSHLL.S32 q0,d0,#0 +_NEON2SSE_INLINE int64x2_t vshll_n_s32(int32x2_t a, __constrange(0,32) int b) // VSHLL.S32 q0,d0,#0 +{ + __m128i r; + r = _MM_CVTEPI32_EPI64(_pM128i(a)); //SSE4.1, + return _mm_slli_epi64 (r, b); +} + +uint16x8_t vshll_n_u8(uint8x8_t a, __constrange(0,8) int b); // VSHLL.U8 q0,d0,#0 +_NEON2SSE_INLINE uint16x8_t vshll_n_u8(uint8x8_t a, __constrange(0,8) int b) // VSHLL.U8 q0,d0,#0 +{ + //no uint8 to uint16 conversion available, manual conversion used + __m128i zero, r; + zero = _mm_setzero_si128 (); + r = _mm_unpacklo_epi8(_pM128i(a), zero); + return _mm_slli_epi16 (r, b); +} + +uint32x4_t vshll_n_u16(uint16x4_t a, __constrange(0,16) int b); // VSHLL.s16 q0,d0,#0 +_NEON2SSE_INLINE uint32x4_t vshll_n_u16(uint16x4_t a, __constrange(0,16) int b) // VSHLL.s16 q0,d0,#0 +{ + //no uint16 to uint32 conversion available, manual conversion used + __m128i zero, r; + zero = _mm_setzero_si128 (); + r = _mm_unpacklo_epi16(_pM128i(a), zero); + return _mm_slli_epi32 (r, b); +} + +uint64x2_t vshll_n_u32(uint32x2_t a, __constrange(0,32) int b); // VSHLL.U32 q0,d0,#0 +_NEON2SSE_INLINE uint64x2_t vshll_n_u32(uint32x2_t a, __constrange(0,32) int b) // VSHLL.U32 q0,d0,#0 +{ + //no uint32 to uint64 conversion available, manual conversion used + __m128i zero, r; + zero = _mm_setzero_si128 (); + r = _mm_unpacklo_epi32(_pM128i(a), zero); + return _mm_slli_epi64 (r, b); +} + +//************************************************************************************ +//**************************** Shifts with insert ************************************ +//************************************************************************************ +//takes each element in a vector, shifts them by an immediate value, +//and inserts the results in the destination vector. Bits shifted out of the each element are lost. + +//**************** Vector shift right and insert ************************************ +//Actually the "c" left bits from "a" are the only bits remained from "a" after the shift. +//All other bits are taken from b shifted. +int8x8_t vsri_n_s8(int8x8_t a, int8x8_t b, __constrange(1,8) int c); // VSRI.8 d0,d0,#8 +_NEON2SSE_INLINE int8x8_t vsri_n_s8(int8x8_t a, int8x8_t b, __constrange(1,8) int c) +{ + int8x8_t res64; + return64(vsriq_n_s8(_pM128i(a),_pM128i(b), c)); +} + + +int16x4_t vsri_n_s16(int16x4_t a, int16x4_t b, __constrange(1,16) int c); // VSRI.16 d0,d0,#16 +_NEON2SSE_INLINE int16x4_t vsri_n_s16(int16x4_t a, int16x4_t b, __constrange(1,16) int c) +{ + int16x4_t res64; + return64(vsriq_n_s16(_pM128i(a),_pM128i(b), c)); +} + + +int32x2_t vsri_n_s32(int32x2_t a, int32x2_t b, __constrange(1,32) int c); // VSRI.32 d0,d0,#32 +_NEON2SSE_INLINE int32x2_t vsri_n_s32(int32x2_t a, int32x2_t b, __constrange(1,32) int c) +{ + int32x2_t res64; + return64(vsriq_n_s32(_pM128i(a),_pM128i(b), c)); +} + + +int64x1_t vsri_n_s64(int64x1_t a, int64x1_t b, __constrange(1,64) int c); // VSRI.64 d0,d0,#64 +_NEON2SSE_INLINE int64x1_t vsri_n_s64(int64x1_t a, int64x1_t b, __constrange(1,64) int c) +{ + int64x1_t res; + if (c ==64) + res = a; + else{ + res.m64_i64[0] = (b.m64_u64[0] >> c) | ((a.m64_i64[0] >> (64 - c)) << (64 - c)); //treat b as unsigned for shift to get leading zeros + } + return res; +} + +uint8x8_t vsri_n_u8(uint8x8_t a, uint8x8_t b, __constrange(1,8) int c); // VSRI.8 d0,d0,#8 +#define vsri_n_u8 vsri_n_s8 + +uint16x4_t vsri_n_u16(uint16x4_t a, uint16x4_t b, __constrange(1,16) int c); // VSRI.16 d0,d0,#16 +#define vsri_n_u16 vsri_n_s16 + +uint32x2_t vsri_n_u32(uint32x2_t a, uint32x2_t b, __constrange(1,32) int c); // VSRI.32 d0,d0,#32 +#define vsri_n_u32 vsri_n_s32 + + +uint64x1_t vsri_n_u64(uint64x1_t a, uint64x1_t b, __constrange(1,64) int c); // VSRI.64 d0,d0,#64 +#define vsri_n_u64 vsri_n_s64 + +poly8x8_t vsri_n_p8(poly8x8_t a, poly8x8_t b, __constrange(1,8) int c); // VSRI.8 d0,d0,#8 +#define vsri_n_p8 vsri_n_u8 + +poly16x4_t vsri_n_p16(poly16x4_t a, poly16x4_t b, __constrange(1,16) int c); // VSRI.16 d0,d0,#16 +#define vsri_n_p16 vsri_n_u16 + +int8x16_t vsriq_n_s8(int8x16_t a, int8x16_t b, __constrange(1,8) int c); // VSRI.8 q0,q0,#8 +_NEON2SSE_INLINE int8x16_t vsriq_n_s8(int8x16_t a, int8x16_t b, __constrange(1,8) int c) // VSRI.8 q0,q0,#8 +{ + __m128i maskA, a_masked; + uint8x16_t b_shift; + _NEON2SSE_ALIGN_16 uint8_t maskLeft[9] = {0x0, 0x80, 0xc0, 0xe0, 0xf0, 0xf8, 0xfc, 0xfe, 0xff}; //"a" bits mask, 0 bit not used + maskA = _mm_set1_epi8(maskLeft[c]); // c ones and (8-c)zeros + a_masked = _mm_and_si128 (a, maskA); + b_shift = vshrq_n_u8( b, c); // c zeros on the left in b due to logical shift + return _mm_or_si128 (a_masked, b_shift); //combine (insert b into a) +} + +int16x8_t vsriq_n_s16(int16x8_t a, int16x8_t b, __constrange(1,16) int c); // VSRI.16 q0,q0,#16 +_NEON2SSE_INLINE int16x8_t vsriq_n_s16(int16x8_t a, int16x8_t b, __constrange(1,16) int c) // VSRI.16 q0,q0,#16 +{ + //to cut "c" left bits from a we do shift right and then shift back left providing c right zeros in a + uint16x8_t b_shift; + uint16x8_t a_c; + b_shift = vshrq_n_u16( b, c); // c zeros on the left in b due to logical shift + a_c = vshrq_n_u16( a, (16 - c)); + a_c = _mm_slli_epi16(a_c, (16 - c)); //logical shift provides right "c" bits zeros in a + return _mm_or_si128 (a_c, b_shift); //combine (insert b into a) +} + +int32x4_t vsriq_n_s32(int32x4_t a, int32x4_t b, __constrange(1,32) int c); // VSRI.32 q0,q0,#32 +_NEON2SSE_INLINE int32x4_t vsriq_n_s32(int32x4_t a, int32x4_t b, __constrange(1,32) int c) // VSRI.32 q0,q0,#32 +{ + //to cut "c" left bits from a we do shift right and then shift back left providing c right zeros in a + uint32x4_t b_shift; + uint32x4_t a_c; + b_shift = vshrq_n_u32( b, c); // c zeros on the left in b due to logical shift + a_c = vshrq_n_u32( a, (32 - c)); + a_c = _mm_slli_epi32(a_c, (32 - c)); //logical shift provides right "c" bits zeros in a + return _mm_or_si128 (a_c, b_shift); //combine (insert b into a) +} + +int64x2_t vsriq_n_s64(int64x2_t a, int64x2_t b, __constrange(1,64) int c); // VSRI.64 q0,q0,#64 +_NEON2SSE_INLINE int64x2_t vsriq_n_s64(int64x2_t a, int64x2_t b, __constrange(1,64) int c) +{ + //serial solution may be faster + uint64x2_t b_shift; + uint64x2_t a_c; + b_shift = _mm_srli_epi64(b, c); // c zeros on the left in b due to logical shift + a_c = _mm_srli_epi64(a, (64 - c)); + a_c = _mm_slli_epi64(a_c, (64 - c)); //logical shift provides right "c" bits zeros in a + return _mm_or_si128 (a_c, b_shift); //combine (insert b into a) +} + +uint8x16_t vsriq_n_u8(uint8x16_t a, uint8x16_t b, __constrange(1,8) int c); // VSRI.8 q0,q0,#8 +#define vsriq_n_u8 vsriq_n_s8 + +uint16x8_t vsriq_n_u16(uint16x8_t a, uint16x8_t b, __constrange(1,16) int c); // VSRI.16 q0,q0,#16 +#define vsriq_n_u16 vsriq_n_s16 + +uint32x4_t vsriq_n_u32(uint32x4_t a, uint32x4_t b, __constrange(1,32) int c); // VSRI.32 q0,q0,#32 +#define vsriq_n_u32 vsriq_n_s32 + +uint64x2_t vsriq_n_u64(uint64x2_t a, uint64x2_t b, __constrange(1,64) int c); // VSRI.64 q0,q0,#64 +#define vsriq_n_u64 vsriq_n_s64 + +poly8x16_t vsriq_n_p8(poly8x16_t a, poly8x16_t b, __constrange(1,8) int c); // VSRI.8 q0,q0,#8 +#define vsriq_n_p8 vsriq_n_u8 + +poly16x8_t vsriq_n_p16(poly16x8_t a, poly16x8_t b, __constrange(1,16) int c); // VSRI.16 q0,q0,#16 +#define vsriq_n_p16 vsriq_n_u16 + +//***** Vector shift left and insert ********************************************* +//********************************************************************************* +//Actually the "c" right bits from "a" are the only bits remained from "a" after the shift. +//All other bits are taken from b shifted. Ending zeros are inserted in b in the shift proces. We need to combine "a" and "b shifted". +int8x8_t vsli_n_s8(int8x8_t a, int8x8_t b, __constrange(0,7) int c); // VSLI.8 d0,d0,#0 +_NEON2SSE_INLINE int8x8_t vsli_n_s8(int8x8_t a, int8x8_t b, __constrange(0,7) int c) +{ + int8x8_t res64; + return64(vsliq_n_s8(_pM128i(a),_pM128i(b), c)); +} + + +int16x4_t vsli_n_s16(int16x4_t a, int16x4_t b, __constrange(0,15) int c); // VSLI.16 d0,d0,#0 +_NEON2SSE_INLINE int16x4_t vsli_n_s16(int16x4_t a, int16x4_t b, __constrange(0,15) int c) +{ + int16x4_t res64; + return64(vsliq_n_s16(_pM128i(a),_pM128i(b), c)); +} + + +int32x2_t vsli_n_s32(int32x2_t a, int32x2_t b, __constrange(0,31) int c); // VSLI.32 d0,d0,#0 +_NEON2SSE_INLINE int32x2_t vsli_n_s32(int32x2_t a, int32x2_t b, __constrange(0,31) int c) +{ + int32x2_t res64; + return64(vsliq_n_s32(_pM128i(a),_pM128i(b), c)); +} + +int64x1_t vsli_n_s64(int64x1_t a, int64x1_t b, __constrange(0,63) int c); // VSLI.64 d0,d0,#0 +_NEON2SSE_INLINE int64x1_t vsli_n_s64(int64x1_t a, int64x1_t b, __constrange(0,63) int c) +{ + int64x1_t res; + res.m64_i64[0] = (b.m64_i64[0] << c) | ((a.m64_u64[0] << (64 - c)) >> (64 - c)); //need to treat a as unsigned to get leading zeros + return res; +} + + +uint8x8_t vsli_n_u8(uint8x8_t a, uint8x8_t b, __constrange(0,7) int c); // VSLI.8 d0,d0,#0 +#define vsli_n_u8 vsli_n_s8 + +uint16x4_t vsli_n_u16(uint16x4_t a, uint16x4_t b, __constrange(0,15) int c); // VSLI.16 d0,d0,#0 +#define vsli_n_u16 vsli_n_s16 + +uint32x2_t vsli_n_u32(uint32x2_t a, uint32x2_t b, __constrange(0,31) int c); // VSLI.32 d0,d0,#0 +#define vsli_n_u32 vsli_n_s32 + +uint64x1_t vsli_n_u64(uint64x1_t a, uint64x1_t b, __constrange(0,63) int c); // VSLI.64 d0,d0,#0 +#define vsli_n_u64 vsli_n_s64 + +poly8x8_t vsli_n_p8(poly8x8_t a, poly8x8_t b, __constrange(0,7) int c); // VSLI.8 d0,d0,#0 +#define vsli_n_p8 vsli_n_u8 + +poly16x4_t vsli_n_p16(poly16x4_t a, poly16x4_t b, __constrange(0,15) int c); // VSLI.16 d0,d0,#0 +#define vsli_n_p16 vsli_n_u16 + +int8x16_t vsliq_n_s8(int8x16_t a, int8x16_t b, __constrange(0,7) int c); // VSLI.8 q0,q0,#0 +_NEON2SSE_INLINE int8x16_t vsliq_n_s8(int8x16_t a, int8x16_t b, __constrange(0,7) int c) // VSLI.8 q0,q0,#0 +{ + __m128i maskA, a_masked; + int8x16_t b_shift; + _NEON2SSE_ALIGN_16 uint8_t maskRight[8] = {0x0, 0x1, 0x3, 0x7, 0x0f, 0x1f, 0x3f, 0x7f}; //"a" bits mask + maskA = _mm_set1_epi8(maskRight[c]); // (8-c)zeros and c ones + b_shift = vshlq_n_s8( b, c); + a_masked = _mm_and_si128 (a, maskA); + return _mm_or_si128 (b_shift, a_masked); //combine (insert b into a) +} + +int16x8_t vsliq_n_s16(int16x8_t a, int16x8_t b, __constrange(0,15) int c); // VSLI.16 q0,q0,#0 +_NEON2SSE_INLINE int16x8_t vsliq_n_s16(int16x8_t a, int16x8_t b, __constrange(0,15) int c) // VSLI.16 q0,q0,#0 +{ + //to cut "c" right bits from a we do shift left and then logical shift back right providing (16-c)zeros in a + int16x8_t b_shift; + int16x8_t a_c; + b_shift = vshlq_n_s16( b, c); + a_c = vshlq_n_s16( a, (16 - c)); + a_c = _mm_srli_epi16(a_c, (16 - c)); + return _mm_or_si128 (b_shift, a_c); //combine (insert b into a) +} + +int32x4_t vsliq_n_s32(int32x4_t a, int32x4_t b, __constrange(0,31) int c); // VSLI.32 q0,q0,#0 +_NEON2SSE_INLINE int32x4_t vsliq_n_s32(int32x4_t a, int32x4_t b, __constrange(0,31) int c) // VSLI.32 q0,q0,#0 +{ + //solution may be not optimal compared with the serial one + //to cut "c" right bits from a we do shift left and then logical shift back right providing (32-c)zeros in a + int32x4_t b_shift; + int32x4_t a_c; + b_shift = vshlq_n_s32( b, c); + a_c = vshlq_n_s32( a, (32 - c)); + a_c = _mm_srli_epi32(a_c, (32 - c)); + return _mm_or_si128 (b_shift, a_c); //combine (insert b into a) +} + +int64x2_t vsliq_n_s64(int64x2_t a, int64x2_t b, __constrange(0,63) int c); // VSLI.64 q0,q0,#0 +_NEON2SSE_INLINE int64x2_t vsliq_n_s64(int64x2_t a, int64x2_t b, __constrange(0,63) int c) // VSLI.64 q0,q0,#0 +{ + //solution may be not optimal compared with the serial one + //to cut "c" right bits from a we do shift left and then logical shift back right providing (64-c)zeros in a + int64x2_t b_shift; + int64x2_t a_c; + b_shift = vshlq_n_s64( b, c); + a_c = vshlq_n_s64( a, (64 - c)); + a_c = _mm_srli_epi64(a_c, (64 - c)); + return _mm_or_si128 (b_shift, a_c); //combine (insert b into a) +} + +uint8x16_t vsliq_n_u8(uint8x16_t a, uint8x16_t b, __constrange(0,7) int c); // VSLI.8 q0,q0,#0 +#define vsliq_n_u8 vsliq_n_s8 + +uint16x8_t vsliq_n_u16(uint16x8_t a, uint16x8_t b, __constrange(0,15) int c); // VSLI.16 q0,q0,#0 +#define vsliq_n_u16 vsliq_n_s16 + +uint32x4_t vsliq_n_u32(uint32x4_t a, uint32x4_t b, __constrange(0,31) int c); // VSLI.32 q0,q0,#0 +#define vsliq_n_u32 vsliq_n_s32 + +uint64x2_t vsliq_n_u64(uint64x2_t a, uint64x2_t b, __constrange(0,63) int c); // VSLI.64 q0,q0,#0 +#define vsliq_n_u64 vsliq_n_s64 + +poly8x16_t vsliq_n_p8(poly8x16_t a, poly8x16_t b, __constrange(0,7) int c); // VSLI.8 q0,q0,#0 +#define vsliq_n_p8 vsliq_n_u8 + +poly16x8_t vsliq_n_p16(poly16x8_t a, poly16x8_t b, __constrange(0,15) int c); // VSLI.16 q0,q0,#0 +#define vsliq_n_p16 vsliq_n_u16 + +// *********************************************************************************************** +// ****************** Loads and stores of a single vector *************************************** +// *********************************************************************************************** +//Performs loads and stores of a single vector of some type. +//******************************* Loads ******************************************************** +// *********************************************************************************************** +//We assume ptr is NOT aligned in general case and use __m128i _mm_loadu_si128 ((__m128i*) ptr);. +//also for SSE3 supporting systems the __m128i _mm_lddqu_si128 (__m128i const* p) usage for unaligned access may be advantageous. +// it loads a 32-byte block aligned on a 16-byte boundary and extracts the 16 bytes corresponding to the unaligned access +//If the ptr is aligned then could use __m128i _mm_load_si128 ((__m128i*) ptr) instead; +#define LOAD_SI128(ptr) \ + ( ((unsigned long)(ptr) & 15) == 0 ) ? _mm_load_si128((__m128i*)(ptr)) : _mm_loadu_si128((__m128i*)(ptr)); + +uint8x16_t vld1q_u8(__transfersize(16) uint8_t const * ptr); // VLD1.8 {d0, d1}, [r0] +#define vld1q_u8 LOAD_SI128 + +uint16x8_t vld1q_u16(__transfersize(8) uint16_t const * ptr); // VLD1.16 {d0, d1}, [r0] +#define vld1q_u16 LOAD_SI128 + +uint32x4_t vld1q_u32(__transfersize(4) uint32_t const * ptr); // VLD1.32 {d0, d1}, [r0] +#define vld1q_u32 LOAD_SI128 + +uint64x2_t vld1q_u64(__transfersize(2) uint64_t const * ptr); // VLD1.64 {d0, d1}, [r0] +#define vld1q_u64 LOAD_SI128 + +int8x16_t vld1q_s8(__transfersize(16) int8_t const * ptr); // VLD1.8 {d0, d1}, [r0] +#define vld1q_s8 LOAD_SI128 + +int16x8_t vld1q_s16(__transfersize(8) int16_t const * ptr); // VLD1.16 {d0, d1}, [r0] +#define vld1q_s16 LOAD_SI128 + +int32x4_t vld1q_s32(__transfersize(4) int32_t const * ptr); // VLD1.32 {d0, d1}, [r0] +#define vld1q_s32 LOAD_SI128 + +int64x2_t vld1q_s64(__transfersize(2) int64_t const * ptr); // VLD1.64 {d0, d1}, [r0] +#define vld1q_s64 LOAD_SI128 + +float16x8_t vld1q_f16(__transfersize(8) __fp16 const * ptr); // VLD1.16 {d0, d1}, [r0] +// IA32 SIMD doesn't work with 16bit floats currently, so need to go to 32 bit and then work with two 128bit registers +/* _NEON2SSE_INLINE float16x8_t vld1q_f16(__transfersize(8) __fp16 const * ptr)// VLD1.16 {d0, d1}, [r0] +{__m128 f1 = _mm_set_ps (ptr[3], ptr[2], ptr[1], ptr[0]); +__m128 f2; +f2 = _mm_set_ps (ptr[7], ptr[6], ptr[5], ptr[4]); +}*/ + +float32x4_t vld1q_f32(__transfersize(4) float32_t const * ptr); // VLD1.32 {d0, d1}, [r0] +_NEON2SSE_INLINE float32x4_t vld1q_f32(__transfersize(4) float32_t const * ptr) +{ + if( (((unsigned long)(ptr)) & 15 ) == 0 ) //16 bits aligned + return _mm_load_ps(ptr); + else + return _mm_loadu_ps(ptr); +} + +poly8x16_t vld1q_p8(__transfersize(16) poly8_t const * ptr); // VLD1.8 {d0, d1}, [r0] +#define vld1q_p8 LOAD_SI128 + +poly16x8_t vld1q_p16(__transfersize(8) poly16_t const * ptr); // VLD1.16 {d0, d1}, [r0] +#define vld1q_p16 LOAD_SI128 + +uint8x8_t vld1_u8(__transfersize(8) uint8_t const * ptr); // VLD1.8 {d0}, [r0] +#define vld1_u8(ptr) *((__m64_128*)(ptr)) //was _mm_loadl_epi64((__m128i*)(ptr)) + +uint16x4_t vld1_u16(__transfersize(4) uint16_t const * ptr); // VLD1.16 {d0}, [r0] +#define vld1_u16 vld1_u8 + +uint32x2_t vld1_u32(__transfersize(2) uint32_t const * ptr); // VLD1.32 {d0}, [r0] +#define vld1_u32 vld1_u8 + + +uint64x1_t vld1_u64(__transfersize(1) uint64_t const * ptr); // VLD1.64 {d0}, [r0] +#define vld1_u64 vld1_u8 + +int8x8_t vld1_s8(__transfersize(8) int8_t const * ptr); // VLD1.8 {d0}, [r0] +#define vld1_s8 vld1_u8 + +int16x4_t vld1_s16(__transfersize(4) int16_t const * ptr); // VLD1.16 {d0}, [r0] +#define vld1_s16 vld1_u16 + +int32x2_t vld1_s32(__transfersize(2) int32_t const * ptr); // VLD1.32 {d0}, [r0] +#define vld1_s32 vld1_u32 + +int64x1_t vld1_s64(__transfersize(1) int64_t const * ptr); // VLD1.64 {d0}, [r0] +#define vld1_s64 vld1_u64 + +float16x4_t vld1_f16(__transfersize(4) __fp16 const * ptr); // VLD1.16 {d0}, [r0] +// IA32 SIMD doesn't work with 16bit floats currently, so need to go to 32 bit like _mm_set_ps (ptr[3], ptr[2], ptr[1], ptr[0]); + +float32x2_t vld1_f32(__transfersize(2) float32_t const * ptr); // VLD1.32 {d0}, [r0] +_NEON2SSE_INLINE float32x2_t vld1_f32(__transfersize(2) float32_t const * ptr) +{ + float32x2_t res; + res.m64_f32[0] = *(ptr); + res.m64_f32[1] = *(ptr + 1); + return res; +} + +poly8x8_t vld1_p8(__transfersize(8) poly8_t const * ptr); // VLD1.8 {d0}, [r0] +#define vld1_p8 vld1_u8 + +poly16x4_t vld1_p16(__transfersize(4) poly16_t const * ptr); // VLD1.16 {d0}, [r0] +#define vld1_p16 vld1_u16 + +//*********************************************************************************************************** +//******* Lane load functions - insert the data at vector's given position (lane) ************************* +//*********************************************************************************************************** +uint8x16_t vld1q_lane_u8(__transfersize(1) uint8_t const * ptr, uint8x16_t vec, __constrange(0,15) int lane); // VLD1.8 {d0[0]}, [r0] +#define vld1q_lane_u8(ptr, vec, lane) _MM_INSERT_EPI8(vec, *(ptr), lane) + +uint16x8_t vld1q_lane_u16(__transfersize(1) uint16_t const * ptr, uint16x8_t vec, __constrange(0,7) int lane); // VLD1.16 {d0[0]}, [r0] +#define vld1q_lane_u16(ptr, vec, lane) _MM_INSERT_EPI16(vec, *(ptr), lane) + +uint32x4_t vld1q_lane_u32(__transfersize(1) uint32_t const * ptr, uint32x4_t vec, __constrange(0,3) int lane); // VLD1.32 {d0[0]}, [r0] +#define vld1q_lane_u32(ptr, vec, lane) _MM_INSERT_EPI32(vec, *(ptr), lane) + +uint64x2_t vld1q_lane_u64(__transfersize(1) uint64_t const * ptr, uint64x2_t vec, __constrange(0,1) int lane); // VLD1.64 {d0}, [r0] +#define vld1q_lane_u64(ptr, vec, lane) _MM_INSERT_EPI64(vec, *(ptr), lane); // _p; + + +int8x16_t vld1q_lane_s8(__transfersize(1) int8_t const * ptr, int8x16_t vec, __constrange(0,15) int lane); // VLD1.8 {d0[0]}, [r0] +#define vld1q_lane_s8(ptr, vec, lane) _MM_INSERT_EPI8(vec, *(ptr), lane) + +int16x8_t vld1q_lane_s16(__transfersize(1) int16_t const * ptr, int16x8_t vec, __constrange(0,7) int lane); // VLD1.16 {d0[0]}, [r0] +#define vld1q_lane_s16(ptr, vec, lane) _MM_INSERT_EPI16(vec, *(ptr), lane) + +int32x4_t vld1q_lane_s32(__transfersize(1) int32_t const * ptr, int32x4_t vec, __constrange(0,3) int lane); // VLD1.32 {d0[0]}, [r0] +#define vld1q_lane_s32(ptr, vec, lane) _MM_INSERT_EPI32(vec, *(ptr), lane) + +float16x8_t vld1q_lane_f16(__transfersize(1) __fp16 const * ptr, float16x8_t vec, __constrange(0,7) int lane); // VLD1.16 {d0[0]}, [r0] +//current IA SIMD doesn't support float16 + +float32x4_t vld1q_lane_f32(__transfersize(1) float32_t const * ptr, float32x4_t vec, __constrange(0,3) int lane); // VLD1.32 {d0[0]}, [r0] +_NEON2SSE_INLINE float32x4_t vld1q_lane_f32(__transfersize(1) float32_t const * ptr, float32x4_t vec, __constrange(0,3) int lane) +{ + //we need to deal with ptr 16bit NOT aligned case + __m128 p; + p = _mm_set1_ps(*(ptr)); + return _MM_INSERT_PS(vec, p, _INSERTPS_NDX(0, lane)); +} + +int64x2_t vld1q_lane_s64(__transfersize(1) int64_t const * ptr, int64x2_t vec, __constrange(0,1) int lane); // VLD1.64 {d0}, [r0] +#define vld1q_lane_s64(ptr, vec, lane) _MM_INSERT_EPI64(vec, *(ptr), lane) + +poly8x16_t vld1q_lane_p8(__transfersize(1) poly8_t const * ptr, poly8x16_t vec, __constrange(0,15) int lane); // VLD1.8 {d0[0]}, [r0] +#define vld1q_lane_p8(ptr, vec, lane) _MM_INSERT_EPI8(vec, *(ptr), lane) + +poly16x8_t vld1q_lane_p16(__transfersize(1) poly16_t const * ptr, poly16x8_t vec, __constrange(0,7) int lane); // VLD1.16 {d0[0]}, [r0] +#define vld1q_lane_p16(ptr, vec, lane) _MM_INSERT_EPI16(vec, *(ptr), lane) + +uint8x8_t vld1_lane_u8(__transfersize(1) uint8_t const * ptr, uint8x8_t vec, __constrange(0,7) int lane); // VLD1.8 {d0[0]}, [r0] +_NEON2SSE_INLINE uint8x8_t vld1_lane_u8(__transfersize(1) uint8_t const * ptr, uint8x8_t vec, __constrange(0,7) int lane) +{ + uint8x8_t res; + res = vec; + res.m64_u8[lane] = *(ptr); + return res; +} + +uint16x4_t vld1_lane_u16(__transfersize(1) uint16_t const * ptr, uint16x4_t vec, __constrange(0,3) int lane); // VLD1.16 {d0[0]}, [r0] +_NEON2SSE_INLINE uint16x4_t vld1_lane_u16(__transfersize(1) uint16_t const * ptr, uint16x4_t vec, __constrange(0,3) int lane) +{ + uint16x4_t res; + res = vec; + res.m64_u16[lane] = *(ptr); + return res; +} + +uint32x2_t vld1_lane_u32(__transfersize(1) uint32_t const * ptr, uint32x2_t vec, __constrange(0,1) int lane); // VLD1.32 {d0[0]}, [r0] +_NEON2SSE_INLINE uint32x2_t vld1_lane_u32(__transfersize(1) uint32_t const * ptr, uint32x2_t vec, __constrange(0,1) int lane) +{ + uint32x2_t res; + res = vec; + res.m64_u32[lane] = *(ptr); + return res; +} + +uint64x1_t vld1_lane_u64(__transfersize(1) uint64_t const * ptr, uint64x1_t vec, __constrange(0,0) int lane); // VLD1.64 {d0}, [r0] +_NEON2SSE_INLINE uint64x1_t vld1_lane_u64(__transfersize(1) uint64_t const * ptr, uint64x1_t vec, __constrange(0,0) int lane) +{ + uint64x1_t res; + res.m64_u64[0] = *(ptr); + return res; +} + + +int8x8_t vld1_lane_s8(__transfersize(1) int8_t const * ptr, int8x8_t vec, __constrange(0,7) int lane); // VLD1.8 {d0[0]}, [r0] +#define vld1_lane_s8(ptr, vec, lane) vld1_lane_u8((uint8_t*)ptr, vec, lane) + +int16x4_t vld1_lane_s16(__transfersize(1) int16_t const * ptr, int16x4_t vec, __constrange(0,3) int lane); // VLD1.16 {d0[0]}, [r0] +#define vld1_lane_s16(ptr, vec, lane) vld1_lane_u16((uint16_t*)ptr, vec, lane) + +int32x2_t vld1_lane_s32(__transfersize(1) int32_t const * ptr, int32x2_t vec, __constrange(0,1) int lane); // VLD1.32 {d0[0]}, [r0] +#define vld1_lane_s32(ptr, vec, lane) vld1_lane_u32((uint32_t*)ptr, vec, lane) + +float16x4_t vld1_lane_f16(__transfersize(1) __fp16 const * ptr, float16x4_t vec, __constrange(0,3) int lane); // VLD1.16 {d0[0]}, [r0] +//current IA SIMD doesn't support float16 + +float32x2_t vld1_lane_f32(__transfersize(1) float32_t const * ptr, float32x2_t vec, __constrange(0,1) int lane); // VLD1.32 {d0[0]}, [r0] +_NEON2SSE_INLINE float32x2_t vld1_lane_f32(__transfersize(1) float32_t const * ptr, float32x2_t vec, __constrange(0,1) int lane) +{ + float32x2_t res; + res = vec; + res.m64_f32[lane] = *(ptr); + return res; +} + +int64x1_t vld1_lane_s64(__transfersize(1) int64_t const * ptr, int64x1_t vec, __constrange(0,0) int lane); // VLD1.64 {d0}, [r0] +#define vld1_lane_s64(ptr, vec, lane) vld1_lane_u64((uint64_t*)ptr, vec, lane) + +poly8x8_t vld1_lane_p8(__transfersize(1) poly8_t const * ptr, poly8x8_t vec, __constrange(0,7) int lane); // VLD1.8 {d0[0]}, [r0] +#define vld1_lane_p8 vld1_lane_u8 + +poly16x4_t vld1_lane_p16(__transfersize(1) poly16_t const * ptr, poly16x4_t vec, __constrange(0,3) int lane); // VLD1.16 {d0[0]}, [r0] +#define vld1_lane_p16 vld1_lane_s16 + +// ****************** Load single value ( set all lanes of vector with same value from memory)********************** +// ****************************************************************************************************************** +uint8x16_t vld1q_dup_u8(__transfersize(1) uint8_t const * ptr); // VLD1.8 {d0[]}, [r0] +#define vld1q_dup_u8(ptr) _mm_set1_epi8(*(ptr)) + +uint16x8_t vld1q_dup_u16(__transfersize(1) uint16_t const * ptr); // VLD1.16 {d0[]}, [r0] +#define vld1q_dup_u16(ptr) _mm_set1_epi16(*(ptr)) + +uint32x4_t vld1q_dup_u32(__transfersize(1) uint32_t const * ptr); // VLD1.32 {d0[]}, [r0] +#define vld1q_dup_u32(ptr) _mm_set1_epi32(*(ptr)) + +uint64x2_t vld1q_dup_u64(__transfersize(1) uint64_t const * ptr); // VLD1.64 {d0}, [r0] +_NEON2SSE_INLINE uint64x2_t vld1q_dup_u64(__transfersize(1) uint64_t const * ptr) +{ + _NEON2SSE_ALIGN_16 uint64_t val[2] = {*(ptr), *(ptr)}; + return LOAD_SI128(val); +} + +int8x16_t vld1q_dup_s8(__transfersize(1) int8_t const * ptr); // VLD1.8 {d0[]}, [r0] +#define vld1q_dup_s8(ptr) _mm_set1_epi8(*(ptr)) + +int16x8_t vld1q_dup_s16(__transfersize(1) int16_t const * ptr); // VLD1.16 {d0[]}, [r0] +#define vld1q_dup_s16(ptr) _mm_set1_epi16 (*(ptr)) + +int32x4_t vld1q_dup_s32(__transfersize(1) int32_t const * ptr); // VLD1.32 {d0[]}, [r0] +#define vld1q_dup_s32(ptr) _mm_set1_epi32 (*(ptr)) + +int64x2_t vld1q_dup_s64(__transfersize(1) int64_t const * ptr); // VLD1.64 {d0}, [r0] +#define vld1q_dup_s64(ptr) vld1q_dup_u64((uint64_t*)ptr) + +float16x8_t vld1q_dup_f16(__transfersize(1) __fp16 const * ptr); // VLD1.16 {d0[]}, [r0] +//current IA SIMD doesn't support float16, need to go to 32 bits + +float32x4_t vld1q_dup_f32(__transfersize(1) float32_t const * ptr); // VLD1.32 {d0[]}, [r0] +#define vld1q_dup_f32(ptr) _mm_set1_ps (*(ptr)) + +poly8x16_t vld1q_dup_p8(__transfersize(1) poly8_t const * ptr); // VLD1.8 {d0[]}, [r0] +#define vld1q_dup_p8(ptr) _mm_set1_epi8(*(ptr)) + +poly16x8_t vld1q_dup_p16(__transfersize(1) poly16_t const * ptr); // VLD1.16 {d0[]}, [r0] +#define vld1q_dup_p16(ptr) _mm_set1_epi16 (*(ptr)) + +uint8x8_t vld1_dup_u8(__transfersize(1) uint8_t const * ptr); // VLD1.8 {d0[]}, [r0] +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint8x8_t vld1_dup_u8(__transfersize(1) uint8_t const * ptr), _NEON2SSE_REASON_SLOW_SERIAL) +{ + uint8x8_t res; + int i; + for(i = 0; i<8; i++) { + res.m64_u8[i] = *(ptr); + } + return res; +} + +uint16x4_t vld1_dup_u16(__transfersize(1) uint16_t const * ptr); // VLD1.16 {d0[]}, [r0] +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint16x4_t vld1_dup_u16(__transfersize(1) uint16_t const * ptr), _NEON2SSE_REASON_SLOW_SERIAL) +{ + uint16x4_t res; + int i; + for(i = 0; i<4; i++) { + res.m64_u16[i] = *(ptr); + } + return res; +} + +uint32x2_t vld1_dup_u32(__transfersize(1) uint32_t const * ptr); // VLD1.32 {d0[]}, [r0] +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x2_t vld1_dup_u32(__transfersize(1) uint32_t const * ptr), _NEON2SSE_REASON_SLOW_SERIAL) +{ + uint32x2_t res; + res.m64_u32[0] = *(ptr); + res.m64_u32[1] = *(ptr); + return res; +} + +uint64x1_t vld1_dup_u64(__transfersize(1) uint64_t const * ptr); // VLD1.64 {d0}, [r0] +_NEON2SSE_INLINE uint64x1_t vld1_dup_u64(__transfersize(1) uint64_t const * ptr) +{ + uint64x1_t res; + res.m64_u64[0] = *(ptr); + return res; +} + +int8x8_t vld1_dup_s8(__transfersize(1) int8_t const * ptr); // VLD1.8 {d0[]}, [r0] +#define vld1_dup_s8(ptr) vld1_dup_u8((uint8_t*)ptr) + + +int16x4_t vld1_dup_s16(__transfersize(1) int16_t const * ptr); // VLD1.16 {d0[]}, [r0] +#define vld1_dup_s16(ptr) vld1_dup_u16((uint16_t*)ptr) + + +int32x2_t vld1_dup_s32(__transfersize(1) int32_t const * ptr); // VLD1.32 {d0[]}, [r0] +#define vld1_dup_s32(ptr) vld1_dup_u32((uint32_t*)ptr) + + +int64x1_t vld1_dup_s64(__transfersize(1) int64_t const * ptr); // VLD1.64 {d0}, [r0] +#define vld1_dup_s64(ptr) vld1_dup_u64((uint64_t*)ptr) + +float16x4_t vld1_dup_f16(__transfersize(1) __fp16 const * ptr); // VLD1.16 {d0[]}, [r0] +//current IA SIMD doesn't support float16 + +float32x2_t vld1_dup_f32(__transfersize(1) float32_t const * ptr); // VLD1.32 {d0[]}, [r0] +_NEON2SSE_INLINE float32x2_t vld1_dup_f32(__transfersize(1) float32_t const * ptr) +{ + float32x2_t res; + res.m64_f32[0] = *(ptr); + res.m64_f32[1] = res.m64_f32[0]; + return res; // use last 64bits only +} + +poly8x8_t vld1_dup_p8(__transfersize(1) poly8_t const * ptr); // VLD1.8 {d0[]}, [r0] +#define vld1_dup_p8 vld1_dup_u8 + + +poly16x4_t vld1_dup_p16(__transfersize(1) poly16_t const * ptr); // VLD1.16 {d0[]}, [r0] +#define vld1_dup_p16 vld1_dup_u16 + + +//************************************************************************************* +//********************************* Store ********************************************** +//************************************************************************************* +// If ptr is 16bit aligned and you need to store data without cache pollution then use void _mm_stream_si128 ((__m128i*)ptr, val); +//here we assume the case of NOT 16bit aligned ptr possible. If it is aligned we could to use _mm_store_si128 like shown in the following macro +#define STORE_SI128(ptr, val) \ + (((unsigned long)(ptr) & 15) == 0 ) ? _mm_store_si128 ((__m128i*)(ptr), val) : _mm_storeu_si128 ((__m128i*)(ptr), val); + +void vst1q_u8(__transfersize(16) uint8_t * ptr, uint8x16_t val); // VST1.8 {d0, d1}, [r0] +#define vst1q_u8 STORE_SI128 + +void vst1q_u16(__transfersize(8) uint16_t * ptr, uint16x8_t val); // VST1.16 {d0, d1}, [r0] +#define vst1q_u16 STORE_SI128 + +void vst1q_u32(__transfersize(4) uint32_t * ptr, uint32x4_t val); // VST1.32 {d0, d1}, [r0] +#define vst1q_u32 STORE_SI128 + +void vst1q_u64(__transfersize(2) uint64_t * ptr, uint64x2_t val); // VST1.64 {d0, d1}, [r0] +#define vst1q_u64 STORE_SI128 + +void vst1q_s8(__transfersize(16) int8_t * ptr, int8x16_t val); // VST1.8 {d0, d1}, [r0] +#define vst1q_s8 STORE_SI128 + +void vst1q_s16(__transfersize(8) int16_t * ptr, int16x8_t val); // VST1.16 {d0, d1}, [r0] +#define vst1q_s16 STORE_SI128 + +void vst1q_s32(__transfersize(4) int32_t * ptr, int32x4_t val); // VST1.32 {d0, d1}, [r0] +#define vst1q_s32 STORE_SI128 + +void vst1q_s64(__transfersize(2) int64_t * ptr, int64x2_t val); // VST1.64 {d0, d1}, [r0] +#define vst1q_s64 STORE_SI128 + +void vst1q_f16(__transfersize(8) __fp16 * ptr, float16x8_t val); // VST1.16 {d0, d1}, [r0] +// IA32 SIMD doesn't work with 16bit floats currently + +void vst1q_f32(__transfersize(4) float32_t * ptr, float32x4_t val); // VST1.32 {d0, d1}, [r0] +_NEON2SSE_INLINE void vst1q_f32(__transfersize(4) float32_t * ptr, float32x4_t val) +{ + if( ((unsigned long)(ptr) & 15) == 0 ) //16 bits aligned + _mm_store_ps (ptr, val); + else + _mm_storeu_ps (ptr, val); +} + +void vst1q_p8(__transfersize(16) poly8_t * ptr, poly8x16_t val); // VST1.8 {d0, d1}, [r0] +#define vst1q_p8 vst1q_u8 + +void vst1q_p16(__transfersize(8) poly16_t * ptr, poly16x8_t val); // VST1.16 {d0, d1}, [r0] +#define vst1q_p16 vst1q_u16 + +void vst1_u8(__transfersize(8) uint8_t * ptr, uint8x8_t val); // VST1.8 {d0}, [r0] +_NEON2SSE_INLINE void vst1_u8(__transfersize(8) uint8_t * ptr, uint8x8_t val) +{ + int i; + for (i = 0; i<8; i++) { + *(ptr + i) = ((uint8_t*)&val)[i]; + } + //_mm_storel_epi64((__m128i*)ptr, val); + return; +} + +void vst1_u16(__transfersize(4) uint16_t * ptr, uint16x4_t val); // VST1.16 {d0}, [r0] +_NEON2SSE_INLINE void vst1_u16(__transfersize(4) uint16_t * ptr, uint16x4_t val) +{ + int i; + for (i = 0; i<4; i++) { + *(ptr + i) = ((uint16_t*)&val)[i]; + } + //_mm_storel_epi64((__m128i*)ptr, val); + return; +} + +void vst1_u32(__transfersize(2) uint32_t * ptr, uint32x2_t val); // VST1.32 {d0}, [r0] +_NEON2SSE_INLINE void vst1_u32(__transfersize(2) uint32_t * ptr, uint32x2_t val) +{ + int i; + for (i = 0; i<2; i++) { + *(ptr + i) = ((uint32_t*)&val)[i]; + } + //_mm_storel_epi64((__m128i*)ptr, val); + return; +} + +void vst1_u64(__transfersize(1) uint64_t * ptr, uint64x1_t val); // VST1.64 {d0}, [r0] +_NEON2SSE_INLINE void vst1_u64(__transfersize(1) uint64_t * ptr, uint64x1_t val) +{ + *(ptr) = *((uint64_t*)&val); + //_mm_storel_epi64((__m128i*)ptr, val); + return; +} + +void vst1_s8(__transfersize(8) int8_t * ptr, int8x8_t val); // VST1.8 {d0}, [r0] +#define vst1_s8(ptr,val) vst1_u8((uint8_t*)ptr,val) + +void vst1_s16(__transfersize(4) int16_t * ptr, int16x4_t val); // VST1.16 {d0}, [r0] +#define vst1_s16(ptr,val) vst1_u16((uint16_t*)ptr,val) + +void vst1_s32(__transfersize(2) int32_t * ptr, int32x2_t val); // VST1.32 {d0}, [r0] +#define vst1_s32(ptr,val) vst1_u32((uint32_t*)ptr,val) + +void vst1_s64(__transfersize(1) int64_t * ptr, int64x1_t val); // VST1.64 {d0}, [r0] +#define vst1_s64(ptr,val) vst1_u64((uint64_t*)ptr,val) + +void vst1_f16(__transfersize(4) __fp16 * ptr, float16x4_t val); // VST1.16 {d0}, [r0] +//current IA SIMD doesn't support float16 + +void vst1_f32(__transfersize(2) float32_t * ptr, float32x2_t val); // VST1.32 {d0}, [r0] +_NEON2SSE_INLINE void vst1_f32(__transfersize(2) float32_t * ptr, float32x2_t val) +{ + *(ptr) = val.m64_f32[0]; + *(ptr + 1) = val.m64_f32[1]; + return; +} + +void vst1_p8(__transfersize(8) poly8_t * ptr, poly8x8_t val); // VST1.8 {d0}, [r0] +#define vst1_p8 vst1_u8 + +void vst1_p16(__transfersize(4) poly16_t * ptr, poly16x4_t val); // VST1.16 {d0}, [r0] +#define vst1_p16 vst1_u16 + +//***********Store a lane of a vector into memory (extract given lane) ********************* +//****************************************************************************************** +void vst1q_lane_u8(__transfersize(1) uint8_t * ptr, uint8x16_t val, __constrange(0,15) int lane); // VST1.8 {d0[0]}, [r0] +#define vst1q_lane_u8(ptr, val, lane) *(ptr) = _MM_EXTRACT_EPI8 (val, lane) + +void vst1q_lane_u16(__transfersize(1) uint16_t * ptr, uint16x8_t val, __constrange(0,7) int lane); // VST1.16 {d0[0]}, [r0] +#define vst1q_lane_u16(ptr, val, lane) *(ptr) = _MM_EXTRACT_EPI16 (val, lane) + +void vst1q_lane_u32(__transfersize(1) uint32_t * ptr, uint32x4_t val, __constrange(0,3) int lane); // VST1.32 {d0[0]}, [r0] +#define vst1q_lane_u32(ptr, val, lane) *(ptr) = _MM_EXTRACT_EPI32 (val, lane) + +void vst1q_lane_u64(__transfersize(1) uint64_t * ptr, uint64x2_t val, __constrange(0,1) int lane); // VST1.64 {d0}, [r0] +#define vst1q_lane_u64(ptr, val, lane) *(ptr) = _MM_EXTRACT_EPI64 (val, lane) + +void vst1q_lane_s8(__transfersize(1) int8_t * ptr, int8x16_t val, __constrange(0,15) int lane); // VST1.8 {d0[0]}, [r0] +#define vst1q_lane_s8(ptr, val, lane) *(ptr) = _MM_EXTRACT_EPI8 (val, lane) + +void vst1q_lane_s16(__transfersize(1) int16_t * ptr, int16x8_t val, __constrange(0,7) int lane); // VST1.16 {d0[0]}, [r0] +#define vst1q_lane_s16(ptr, val, lane) *(ptr) = _MM_EXTRACT_EPI16 (val, lane) + +void vst1q_lane_s32(__transfersize(1) int32_t * ptr, int32x4_t val, __constrange(0,3) int lane); // VST1.32 {d0[0]}, [r0] +#define vst1q_lane_s32(ptr, val, lane) *(ptr) = _MM_EXTRACT_EPI32 (val, lane) + +void vst1q_lane_s64(__transfersize(1) int64_t * ptr, int64x2_t val, __constrange(0,1) int lane); // VST1.64 {d0}, [r0] +#define vst1q_lane_s64(ptr, val, lane) *(ptr) = _MM_EXTRACT_EPI64 (val, lane) + +void vst1q_lane_f16(__transfersize(1) __fp16 * ptr, float16x8_t val, __constrange(0,7) int lane); // VST1.16 {d0[0]}, [r0] +//current IA SIMD doesn't support float16 + +void vst1q_lane_f32(__transfersize(1) float32_t * ptr, float32x4_t val, __constrange(0,3) int lane); // VST1.32 {d0[0]}, [r0] +_NEON2SSE_INLINE void vst1q_lane_f32(__transfersize(1) float32_t * ptr, float32x4_t val, __constrange(0,3) int lane) +{ + int32_t ilane; + ilane = _MM_EXTRACT_PS(val,lane); + *(ptr) = *((float*)&ilane); +} + +void vst1q_lane_p8(__transfersize(1) poly8_t * ptr, poly8x16_t val, __constrange(0,15) int lane); // VST1.8 {d0[0]}, [r0] +#define vst1q_lane_p8 vst1q_lane_u8 + +void vst1q_lane_p16(__transfersize(1) poly16_t * ptr, poly16x8_t val, __constrange(0,7) int lane); // VST1.16 {d0[0]}, [r0] +#define vst1q_lane_p16 vst1q_lane_s16 + +void vst1_lane_u8(__transfersize(1) uint8_t * ptr, uint8x8_t val, __constrange(0,7) int lane); // VST1.8 {d0[0]}, [r0] +_NEON2SSE_INLINE void vst1_lane_u8(__transfersize(1) uint8_t * ptr, uint8x8_t val, __constrange(0,7) int lane) +{ + *(ptr) = val.m64_u8[lane]; +} + +void vst1_lane_u16(__transfersize(1) uint16_t * ptr, uint16x4_t val, __constrange(0,3) int lane); // VST1.16 {d0[0]}, [r0] +_NEON2SSE_INLINE void vst1_lane_u16(__transfersize(1) uint16_t * ptr, uint16x4_t val, __constrange(0,3) int lane) +{ + *(ptr) = val.m64_u16[lane]; +} + +void vst1_lane_u32(__transfersize(1) uint32_t * ptr, uint32x2_t val, __constrange(0,1) int lane); // VST1.32 {d0[0]}, [r0] +_NEON2SSE_INLINE void vst1_lane_u32(__transfersize(1) uint32_t * ptr, uint32x2_t val, __constrange(0,1) int lane) +{ + *(ptr) = val.m64_u32[lane]; +} + +void vst1_lane_u64(__transfersize(1) uint64_t * ptr, uint64x1_t val, __constrange(0,0) int lane); // VST1.64 {d0}, [r0] +_NEON2SSE_INLINE void vst1_lane_u64(__transfersize(1) uint64_t * ptr, uint64x1_t val, __constrange(0,0) int lane) +{ + *(ptr) = val.m64_u64[0]; +} + +void vst1_lane_s8(__transfersize(1) int8_t * ptr, int8x8_t val, __constrange(0,7) int lane); // VST1.8 {d0[0]}, [r0] +#define vst1_lane_s8(ptr, val, lane) vst1_lane_u8((uint8_t*)ptr, val, lane) + +void vst1_lane_s16(__transfersize(1) int16_t * ptr, int16x4_t val, __constrange(0,3) int lane); // VST1.16 {d0[0]}, [r0] +#define vst1_lane_s16(ptr, val, lane) vst1_lane_u16((uint16_t*)ptr, val, lane) + +void vst1_lane_s32(__transfersize(1) int32_t * ptr, int32x2_t val, __constrange(0,1) int lane); // VST1.32 {d0[0]}, [r0] +#define vst1_lane_s32(ptr, val, lane) vst1_lane_u32((uint32_t*)ptr, val, lane) + + +void vst1_lane_s64(__transfersize(1) int64_t * ptr, int64x1_t val, __constrange(0,0) int lane); // VST1.64 {d0}, [r0] +#define vst1_lane_s64(ptr, val, lane) vst1_lane_u64((uint64_t*)ptr, val, lane) + + +void vst1_lane_f16(__transfersize(1) __fp16 * ptr, float16x4_t val, __constrange(0,3) int lane); // VST1.16 {d0[0]}, [r0] +//current IA SIMD doesn't support float16 + +void vst1_lane_f32(__transfersize(1) float32_t * ptr, float32x2_t val, __constrange(0,1) int lane); // VST1.32 {d0[0]}, [r0] +_NEON2SSE_INLINE void vst1_lane_f32(__transfersize(1) float32_t * ptr, float32x2_t val, __constrange(0,1) int lane) +{ + *(ptr) = val.m64_f32[lane]; +} + +void vst1_lane_p8(__transfersize(1) poly8_t * ptr, poly8x8_t val, __constrange(0,7) int lane); // VST1.8 {d0[0]}, [r0] +#define vst1_lane_p8 vst1_lane_u8 + +void vst1_lane_p16(__transfersize(1) poly16_t * ptr, poly16x4_t val, __constrange(0,3) int lane); // VST1.16 {d0[0]}, [r0] +#define vst1_lane_p16 vst1_lane_s16 + +//*********************************************************************************************** +//**************** Loads and stores of an N-element structure ********************************** +//*********************************************************************************************** +//These intrinsics load or store an n-element structure. The array structures are defined in the beginning +//We assume ptr is NOT aligned in general case, for more details see "Loads and stores of a single vector functions" +//****************** 2 elements load ********************************************* +uint8x16x2_t vld2q_u8(__transfersize(32) uint8_t const * ptr); // VLD2.8 {d0, d2}, [r0] +_NEON2SSE_INLINE uint8x16x2_t vld2q_u8(__transfersize(32) uint8_t const * ptr) // VLD2.8 {d0, d2}, [r0] +{ + uint8x16x2_t v; + v.val[0] = vld1q_u8(ptr); + v.val[1] = vld1q_u8((ptr + 16)); + v = vuzpq_s8(v.val[0], v.val[1]); + return v; +} + +uint16x8x2_t vld2q_u16(__transfersize(16) uint16_t const * ptr); // VLD2.16 {d0, d2}, [r0] +_NEON2SSE_INLINE uint16x8x2_t vld2q_u16(__transfersize(16) uint16_t const * ptr) // VLD2.16 {d0, d2}, [r0] +{ + uint16x8x2_t v; + v.val[0] = vld1q_u16( ptr); + v.val[1] = vld1q_u16( (ptr + 8)); + v = vuzpq_s16(v.val[0], v.val[1]); + return v; +} + +uint32x4x2_t vld2q_u32(__transfersize(8) uint32_t const * ptr); // VLD2.32 {d0, d2}, [r0] +_NEON2SSE_INLINE uint32x4x2_t vld2q_u32(__transfersize(8) uint32_t const * ptr) // VLD2.32 {d0, d2}, [r0] +{ + uint32x4x2_t v; + v.val[0] = vld1q_u32 ( ptr); + v.val[1] = vld1q_u32 ( (ptr + 4)); + v = vuzpq_s32(v.val[0], v.val[1]); + return v; +} + +int8x16x2_t vld2q_s8(__transfersize(32) int8_t const * ptr); +#define vld2q_s8(ptr) vld2q_u8((uint8_t*) ptr) + +int16x8x2_t vld2q_s16(__transfersize(16) int16_t const * ptr); // VLD2.16 {d0, d2}, [r0] +#define vld2q_s16(ptr) vld2q_u16((uint16_t*) ptr) + +int32x4x2_t vld2q_s32(__transfersize(8) int32_t const * ptr); // VLD2.32 {d0, d2}, [r0] +#define vld2q_s32(ptr) vld2q_u32((uint32_t*) ptr) + + +float16x8x2_t vld2q_f16(__transfersize(16) __fp16 const * ptr); // VLD2.16 {d0, d2}, [r0] +// IA32 SIMD doesn't work with 16bit floats currently, so need to go to 32 bit and then work with two 128bit registers. See vld1q_f16 for example + +float32x4x2_t vld2q_f32(__transfersize(8) float32_t const * ptr); // VLD2.32 {d0, d2}, [r0] +_NEON2SSE_INLINE float32x4x2_t vld2q_f32(__transfersize(8) float32_t const * ptr) // VLD2.32 {d0, d2}, [r0] +{ + float32x4x2_t v; + v.val[0] = vld1q_f32 (ptr); + v.val[1] = vld1q_f32 ((ptr + 4)); + v = vuzpq_f32(v.val[0], v.val[1]); + return v; +} + +poly8x16x2_t vld2q_p8(__transfersize(32) poly8_t const * ptr); // VLD2.8 {d0, d2}, [r0] +#define vld2q_p8 vld2q_u8 + +poly16x8x2_t vld2q_p16(__transfersize(16) poly16_t const * ptr); // VLD2.16 {d0, d2}, [r0] +#define vld2q_p16 vld2q_u16 + +uint8x8x2_t vld2_u8(__transfersize(16) uint8_t const * ptr); // VLD2.8 {d0, d1}, [r0] +_NEON2SSE_INLINE uint8x8x2_t vld2_u8(__transfersize(16) uint8_t const * ptr) +{ + uint8x8x2_t v; + _NEON2SSE_ALIGN_16 int8_t mask8_even_odd[16] = { 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15}; + __m128i ld128; + ld128 = vld1q_u8(ptr); //merge two 64-bits in 128 bit + ld128 = _mm_shuffle_epi8(ld128, *(__m128i*)mask8_even_odd); + vst1q_u8((v.val), ld128); // v.val[1] = _mm_shuffle_epi32(v.val[0], _SWAP_HI_LOW32); + return v; +} + +uint16x4x2_t vld2_u16(__transfersize(8) uint16_t const * ptr); // VLD2.16 {d0, d1}, [r0] +_NEON2SSE_INLINE uint16x4x2_t vld2_u16(__transfersize(8) uint16_t const * ptr) +{ + _NEON2SSE_ALIGN_16 uint16x4x2_t v; + _NEON2SSE_ALIGN_16 int8_t mask16_even_odd[16] = { 0,1, 4,5, 8,9, 12,13, 2,3, 6,7, 10,11, 14,15}; + __m128i ld128; + ld128 = vld1q_u16(ptr); //merge two 64-bits in 128 bit + ld128 = _mm_shuffle_epi8(ld128, *(__m128i*)mask16_even_odd); + vst1q_u16((v.val), ld128); + return v; +} + +uint32x2x2_t vld2_u32(__transfersize(4) uint32_t const * ptr); // VLD2.32 {d0, d1}, [r0] +_NEON2SSE_INLINE uint32x2x2_t vld2_u32(__transfersize(4) uint32_t const * ptr) +{ + _NEON2SSE_ALIGN_16 uint32x2x2_t v; + __m128i ld128; + ld128 = vld1q_u32(ptr); //merge two 64-bits in 128 bit + ld128 = _mm_shuffle_epi32(ld128, 0 | (2 << 2) | (1 << 4) | (3 << 6)); + vst1q_u32((v.val), ld128); + return v; +} + +uint64x1x2_t vld2_u64(__transfersize(2) uint64_t const * ptr); // VLD1.64 {d0, d1}, [r0] +_NEON2SSE_INLINE uint64x1x2_t vld2_u64(__transfersize(2) uint64_t const * ptr) +{ + uint64x1x2_t v; + v.val[0].m64_u64[0] = *(ptr); + v.val[1].m64_u64[0] = *(ptr + 1); + return v; +} + +int8x8x2_t vld2_s8(__transfersize(16) int8_t const * ptr); // VLD2.8 {d0, d1}, [r0] +#define vld2_s8(ptr) vld2_u8((uint8_t*)ptr) + +int16x4x2_t vld2_s16(__transfersize(8) int16_t const * ptr); // VLD2.16 {d0, d1}, [r0] +#define vld2_s16(ptr) vld2_u16((uint16_t*)ptr) + +int32x2x2_t vld2_s32(__transfersize(4) int32_t const * ptr); // VLD2.32 {d0, d1}, [r0] +#define vld2_s32(ptr) vld2_u32((uint32_t*)ptr) + +int64x1x2_t vld2_s64(__transfersize(2) int64_t const * ptr); // VLD1.64 {d0, d1}, [r0] +#define vld2_s64(ptr) vld2_u64((uint64_t*)ptr) + +float16x4x2_t vld2_f16(__transfersize(8) __fp16 const * ptr); // VLD2.16 {d0, d1}, [r0] +// IA32 SIMD doesn't work with 16bit floats currently, so need to go to 32 bit and then work with two 128bit registers. See vld1_f16 for example + +float32x2x2_t vld2_f32(__transfersize(4) float32_t const * ptr); // VLD2.32 {d0, d1}, [r0] +_NEON2SSE_INLINE float32x2x2_t vld2_f32(__transfersize(4) float32_t const * ptr) +{ + float32x2x2_t v; + v.val[0].m64_f32[0] = *(ptr); + v.val[0].m64_f32[1] = *(ptr + 2); + v.val[1].m64_f32[0] = *(ptr + 1); + v.val[1].m64_f32[1] = *(ptr + 3); + return v; +} + +poly8x8x2_t vld2_p8(__transfersize(16) poly8_t const * ptr); // VLD2.8 {d0, d1}, [r0] +#define vld2_p8 vld2_u8 + +poly16x4x2_t vld2_p16(__transfersize(8) poly16_t const * ptr); // VLD2.16 {d0, d1}, [r0] +#define vld2_p16 vld2_u16 + +//******************** Triplets *************************************** +//********************************************************************* +uint8x16x3_t vld3q_u8(__transfersize(48) uint8_t const * ptr); // VLD3.8 {d0, d2, d4}, [r0] +_NEON2SSE_INLINE uint8x16x3_t vld3q_u8(__transfersize(48) uint8_t const * ptr) // VLD3.8 {d0, d2, d4}, [r0] +{ + //a0,a1,a2,a3,...a7,a8,...a15, b0,b1,b2,...b7,b8,...b15, c0,c1,c2,...c7,c8,...c15 -> + //a:0,3,6,9,12,15,b:2,5,8,11,14, c:1,4,7,10,13 + //a:1,4,7,10,13, b:0,3,6,9,12,15,c:2,5,8,11,14, + //a:2,5,8,11,14, b:1,4,7,10,13, c:0,3,6,9,12,15 + uint8x16x3_t v; + __m128i tmp0, tmp1,tmp2, tmp3; + _NEON2SSE_ALIGN_16 int8_t mask8_0[16] = {0,3,6,9,12,15,1,4,7,10,13,2,5,8,11,14}; + _NEON2SSE_ALIGN_16 int8_t mask8_1[16] = {2,5,8,11,14,0,3,6,9,12,15,1,4,7,10,13}; + _NEON2SSE_ALIGN_16 int8_t mask8_2[16] = {1,4,7,10,13,2,5,8,11,14,0,3,6,9,12,15}; + + v.val[0] = vld1q_u8 (ptr); //a0,a1,a2,a3,...a7, ...a15 + v.val[1] = vld1q_u8 ((ptr + 16)); //b0,b1,b2,b3...b7, ...b15 + v.val[2] = vld1q_u8 ((ptr + 32)); //c0,c1,c2,c3,...c7,...c15 + + tmp0 = _mm_shuffle_epi8(v.val[0], *(__m128i*)mask8_0); //a:0,3,6,9,12,15,1,4,7,10,13,2,5,8,11 + tmp1 = _mm_shuffle_epi8(v.val[1], *(__m128i*)mask8_1); //b:2,5,8,11,14,0,3,6,9,12,15,1,4,7,10,13 + tmp2 = _mm_shuffle_epi8(v.val[2], *(__m128i*)mask8_2); //c:1,4,7,10,13,2,5,8,11,14,3,6,9,12,15 + + tmp3 = _mm_slli_si128(tmp0,10); //0,0,0,0,0,0,0,0,0,0,a0,a3,a6,a9,a12,a15 + tmp3 = _mm_alignr_epi8(tmp1,tmp3, 10); //a:0,3,6,9,12,15,b:2,5,8,11,14,x,x,x,x,x + tmp3 = _mm_slli_si128(tmp3, 5); //0,0,0,0,0,a:0,3,6,9,12,15,b:2,5,8,11,14, + tmp3 = _mm_srli_si128(tmp3, 5); //a:0,3,6,9,12,15,b:2,5,8,11,14,:0,0,0,0,0 + v.val[0] = _mm_slli_si128(tmp2, 11); //0,0,0,0,0,0,0,0,0,0,0,0, 1,4,7,10,13, + v.val[0] = _mm_or_si128(v.val[0],tmp3); //a:0,3,6,9,12,15,b:2,5,8,11,14,c:1,4,7,10,13, + + tmp3 = _mm_slli_si128(tmp0, 5); //0,0,0,0,0,a:0,3,6,9,12,15,1,4,7,10,13, + tmp3 = _mm_srli_si128(tmp3, 11); //a:1,4,7,10,13, 0,0,0,0,0,0,0,0,0,0,0 + v.val[1] = _mm_srli_si128(tmp1,5); //b:0,3,6,9,12,15,C:1,4,7,10,13, 0,0,0,0,0 + v.val[1] = _mm_slli_si128(v.val[1], 5); //0,0,0,0,0,b:0,3,6,9,12,15,C:1,4,7,10,13, + v.val[1] = _mm_or_si128(v.val[1],tmp3); //a:1,4,7,10,13,b:0,3,6,9,12,15,C:1,4,7,10,13, + v.val[1] = _mm_slli_si128(v.val[1],5); //0,0,0,0,0,a:1,4,7,10,13,b:0,3,6,9,12,15, + v.val[1] = _mm_srli_si128(v.val[1], 5); //a:1,4,7,10,13,b:0,3,6,9,12,15,0,0,0,0,0 + tmp3 = _mm_srli_si128(tmp2,5); //c:2,5,8,11,14,0,3,6,9,12,15,0,0,0,0,0 + tmp3 = _mm_slli_si128(tmp3,11); //0,0,0,0,0,0,0,0,0,0,0,c:2,5,8,11,14, + v.val[1] = _mm_or_si128(v.val[1],tmp3); //a:1,4,7,10,13,b:0,3,6,9,12,15,c:2,5,8,11,14, + + tmp3 = _mm_srli_si128(tmp2,10); //c:0,3,6,9,12,15, 0,0,0,0,0,0,0,0,0,0, + tmp3 = _mm_slli_si128(tmp3,10); //0,0,0,0,0,0,0,0,0,0, c:0,3,6,9,12,15, + v.val[2] = _mm_srli_si128(tmp1,11); //b:1,4,7,10,13,0,0,0,0,0,0,0,0,0,0,0 + v.val[2] = _mm_slli_si128(v.val[2],5); //0,0,0,0,0,b:1,4,7,10,13, 0,0,0,0,0,0 + v.val[2] = _mm_or_si128(v.val[2],tmp3); //0,0,0,0,0,b:1,4,7,10,13,c:0,3,6,9,12,15, + tmp0 = _mm_srli_si128(tmp0, 11); //a:2,5,8,11,14, 0,0,0,0,0,0,0,0,0,0,0, + v.val[2] = _mm_or_si128(v.val[2],tmp0); //a:2,5,8,11,14,b:1,4,7,10,13,c:0,3,6,9,12,15, + return v; +} + +uint16x8x3_t vld3q_u16(__transfersize(24) uint16_t const * ptr); // VLD3.16 {d0, d2, d4}, [r0] +_NEON2SSE_INLINE uint16x8x3_t vld3q_u16(__transfersize(24) uint16_t const * ptr) // VLD3.16 {d0, d2, d4}, [r0] +{ + //a0, a1,a2,a3,...a7, b0,b1,b2,b3,...b7, c0,c1,c2,c3...c7 -> a0,a3,a6,b1,b4,b7,c2,c5, a1,a4,a7,b2,b5,c0,c3,c6, a2,a5,b0,b3,b6,c1,c4,c7 + uint16x8x3_t v; + __m128i tmp0, tmp1,tmp2, tmp3; + _NEON2SSE_ALIGN_16 int8_t mask16_0[16] = {0,1, 6,7, 12,13, 2,3, 8,9, 14,15, 4,5, 10,11}; + _NEON2SSE_ALIGN_16 int8_t mask16_1[16] = {2,3, 8,9, 14,15, 4,5, 10,11, 0,1, 6,7, 12,13}; + _NEON2SSE_ALIGN_16 int8_t mask16_2[16] = {4,5, 10,11, 0,1, 6,7, 12,13, 2,3, 8,9, 14,15}; + + v.val[0] = vld1q_u16 (ptr); //a0,a1,a2,a3,...a7, + v.val[1] = vld1q_u16 ((ptr + 8)); //b0,b1,b2,b3...b7 + v.val[2] = vld1q_u16 ((ptr + 16)); //c0,c1,c2,c3,...c7 + + tmp0 = _mm_shuffle_epi8(v.val[0], *(__m128i*)mask16_0); //a0,a3,a6,a1,a4,a7,a2,a5, + tmp1 = _mm_shuffle_epi8(v.val[1], *(__m128i*)mask16_1); //b1,b4,b7,b2,b5,b0,b3,b6 + tmp2 = _mm_shuffle_epi8(v.val[2], *(__m128i*)mask16_2); //c2,c5, c0,c3,c6, c1,c4,c7 + + tmp3 = _mm_slli_si128(tmp0,10); //0,0,0,0,0,a0,a3,a6, + tmp3 = _mm_alignr_epi8(tmp1,tmp3, 10); //a0,a3,a6,b1,b4,b7,x,x + tmp3 = _mm_slli_si128(tmp3, 4); //0,0, a0,a3,a6,b1,b4,b7 + tmp3 = _mm_srli_si128(tmp3, 4); //a0,a3,a6,b1,b4,b7,0,0 + v.val[0] = _mm_slli_si128(tmp2, 12); //0,0,0,0,0,0, c2,c5, + v.val[0] = _mm_or_si128(v.val[0],tmp3); //a0,a3,a6,b1,b4,b7,c2,c5 + + tmp3 = _mm_slli_si128(tmp0, 4); //0,0,a0,a3,a6,a1,a4,a7 + tmp3 = _mm_srli_si128(tmp3,10); //a1,a4,a7, 0,0,0,0,0 + v.val[1] = _mm_srli_si128(tmp1,6); //b2,b5,b0,b3,b6,0,0 + v.val[1] = _mm_slli_si128(v.val[1], 6); //0,0,0,b2,b5,b0,b3,b6, + v.val[1] = _mm_or_si128(v.val[1],tmp3); //a1,a4,a7,b2,b5,b0,b3,b6, + v.val[1] = _mm_slli_si128(v.val[1],6); //0,0,0,a1,a4,a7,b2,b5, + v.val[1] = _mm_srli_si128(v.val[1], 6); //a1,a4,a7,b2,b5,0,0,0, + tmp3 = _mm_srli_si128(tmp2,4); //c0,c3,c6, c1,c4,c7,0,0 + tmp3 = _mm_slli_si128(tmp3,10); //0,0,0,0,0,c0,c3,c6, + v.val[1] = _mm_or_si128(v.val[1],tmp3); //a1,a4,a7,b2,b5,c0,c3,c6, + + tmp3 = _mm_srli_si128(tmp2,10); //c1,c4,c7, 0,0,0,0,0 + tmp3 = _mm_slli_si128(tmp3,10); //0,0,0,0,0, c1,c4,c7, + v.val[2] = _mm_srli_si128(tmp1,10); //b0,b3,b6,0,0, 0,0,0 + v.val[2] = _mm_slli_si128(v.val[2],4); //0,0, b0,b3,b6,0,0,0 + v.val[2] = _mm_or_si128(v.val[2],tmp3); //0,0, b0,b3,b6,c1,c4,c7, + tmp0 = _mm_srli_si128(tmp0, 12); //a2,a5,0,0,0,0,0,0 + v.val[2] = _mm_or_si128(v.val[2],tmp0); //a2,a5,b0,b3,b6,c1,c4,c7, + return v; +} + +uint32x4x3_t vld3q_u32(__transfersize(12) uint32_t const * ptr); // VLD3.32 {d0, d2, d4}, [r0] +_NEON2SSE_INLINE uint32x4x3_t vld3q_u32(__transfersize(12) uint32_t const * ptr) // VLD3.32 {d0, d2, d4}, [r0] +{ + //a0,a1,a2,a3, b0,b1,b2,b3, c0,c1,c2,c3 -> a0,a3,b2,c1, a1,b0,b3,c2, a2,b1,c0,c3, + uint32x4x3_t v; + __m128i tmp0, tmp1,tmp2, tmp3; + v.val[0] = vld1q_u32 (ptr); //a0,a1,a2,a3, + v.val[1] = vld1q_u32 ((ptr + 4)); //b0,b1,b2,b3 + v.val[2] = vld1q_u32 ((ptr + 8)); //c0,c1,c2,c3, + + tmp0 = _mm_shuffle_epi32(v.val[0], 0 | (3 << 2) | (1 << 4) | (2 << 6)); //a0,a3,a1,a2 + tmp1 = _mm_shuffle_epi32(v.val[1], _SWAP_HI_LOW32); //b2,b3,b0,b1 + tmp2 = _mm_shuffle_epi32(v.val[2], 1 | (2 << 2) | (0 << 4) | (3 << 6)); //c1,c2, c0,c3 + + tmp3 = _mm_unpacklo_epi32(tmp1, tmp2); //b2,c1, b3,c2 + v.val[0] = _mm_unpacklo_epi64(tmp0,tmp3); //a0,a3,b2,c1 + tmp0 = _mm_unpackhi_epi32(tmp0, tmp1); //a1,b0, a2,b1 + v.val[1] = _mm_shuffle_epi32(tmp0, _SWAP_HI_LOW32 ); //a2,b1, a1,b0, + v.val[1] = _mm_unpackhi_epi64(v.val[1], tmp3); //a1,b0, b3,c2 + v.val[2] = _mm_unpackhi_epi64(tmp0, tmp2); //a2,b1, c0,c3 + return v; +} + +int8x16x3_t vld3q_s8(__transfersize(48) int8_t const * ptr); // VLD3.8 {d0, d2, d4}, [r0] +#define vld3q_s8(ptr) vld3q_u8((uint8_t*) (ptr)) + +int16x8x3_t vld3q_s16(__transfersize(24) int16_t const * ptr); // VLD3.16 {d0, d2, d4}, [r0] +#define vld3q_s16(ptr) vld3q_u16((uint16_t*) (ptr)) + +int32x4x3_t vld3q_s32(__transfersize(12) int32_t const * ptr); // VLD3.32 {d0, d2, d4}, [r0] +#define vld3q_s32(ptr) vld3q_u32((uint32_t*) (ptr)) + +float16x8x3_t vld3q_f16(__transfersize(24) __fp16 const * ptr); // VLD3.16 {d0, d2, d4}, [r0] +// IA32 SIMD doesn't work with 16bit floats currently, so need to go to 32 bit and then work with two 128bit registers. See vld1q_f16 for example + +float32x4x3_t vld3q_f32(__transfersize(12) float32_t const * ptr); // VLD3.32 {d0, d2, d4}, [r0] +_NEON2SSE_INLINE float32x4x3_t vld3q_f32(__transfersize(12) float32_t const * ptr) // VLD3.32 {d0, d2, d4}, [r0] +{ + //a0,a1,a2,a3, b0,b1,b2,b3, c0,c1,c2,c3 -> a0,a3,b2,c1, a1,b0,b3,c2, a2,b1,c0,c3, + float32x4x3_t v; + __m128 tmp0, tmp1,tmp2, tmp3; + v.val[0] = vld1q_f32 (ptr); //a0,a1,a2,a3, + v.val[1] = vld1q_f32 ((ptr + 4)); //b0,b1,b2,b3 + v.val[2] = vld1q_f32 ((ptr + 8)); //c0,c1,c2,c3, + + tmp0 = _mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(v.val[0]), 0 | (3 << 2) | (1 << 4) | (2 << 6))); //a0,a3,a1,a2 + tmp1 = _mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(v.val[1]), _SWAP_HI_LOW32)); //b2,b3,b0,b1 + tmp2 = _mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(v.val[2]), 1 | (2 << 2) | (0 << 4) | (3 << 6))); //c1,c2, c0,c3 + tmp3 = _mm_unpacklo_ps(tmp1, tmp2); //b2,c1, b3,c2 + + v.val[0] = _mm_movelh_ps(tmp0,tmp3); //a0,a3,b2,c1 + tmp0 = _mm_unpackhi_ps(tmp0, tmp1); //a1,b0, a2,b1 + v.val[1] = _mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(tmp0), _SWAP_HI_LOW32 )); //a2,b1, a1,b0, + v.val[1] = _mm_movehl_ps(tmp3,v.val[1]); //a1,b0, b3,c2 + v.val[2] = _mm_movehl_ps(tmp2,tmp0); //a2,b1, c0,c3 + return v; +} + +poly8x16x3_t vld3q_p8(__transfersize(48) poly8_t const * ptr); // VLD3.8 {d0, d2, d4}, [r0] +#define vld3q_p8 vld3q_u8 + +poly16x8x3_t vld3q_p16(__transfersize(24) poly16_t const * ptr); // VLD3.16 {d0, d2, d4}, [r0] +#define vld3q_p16 vld3q_u16 + +uint8x8x3_t vld3_u8(__transfersize(24) uint8_t const * ptr); // VLD3.8 {d0, d1, d2}, [r0] +_NEON2SSE_INLINE uint8x8x3_t vld3_u8(__transfersize(24) uint8_t const * ptr) // VLD3.8 {d0, d1, d2}, [r0] +{ + //a0, a1,a2,a3,...a7, b0,b1,b2,b3,...b7, c0,c1,c2,c3...c7 -> a0,a3,a6,b1,b4,b7,c2,c5, a1,a4,a7,b2,b5,c0,c3,c6, a2,a5,b0,b3,b6,c1,c4,c7 + uint8x8x3_t v; + __m128i val0, val1, val2, tmp0, tmp1; + _NEON2SSE_ALIGN_16 int8_t mask8_0[16] = {0,3,6,9,12,15, 1,4,7,10,13, 2,5,8,11,14}; + _NEON2SSE_ALIGN_16 int8_t mask8_1[16] = {2,5, 0,3,6, 1,4,7, 0,0,0,0,0,0,0,0}; + val0 = vld1q_u8 (ptr); //a0,a1,a2,a3,...a7, b0,b1,b2,b3...b7 + val2 = _mm_loadl_epi64((__m128i*)(ptr + 16)); //c0,c1,c2,c3,...c7 + + tmp0 = _mm_shuffle_epi8(val0, *(__m128i*)mask8_0); //a0,a3,a6,b1,b4,b7, a1,a4,a7,b2,b5, a2,a5,b0,b3,b6, + tmp1 = _mm_shuffle_epi8(val2, *(__m128i*)mask8_1); //c2,c5, c0,c3,c6, c1,c4,c7,x,x,x,x,x,x,x,x + val0 = _mm_slli_si128(tmp0,10); + val0 = _mm_srli_si128(val0,10); //a0,a3,a6,b1,b4,b7, 0,0,0,0,0,0,0,0,0,0 + val2 = _mm_slli_si128(tmp1,6); //0,0,0,0,0,0,c2,c5,x,x,x,x,x,x,x,x + val0 = _mm_or_si128(val0,val2); //a0,a3,a6,b1,b4,b7,c2,c5 x,x,x,x,x,x,x,x + _M64(v.val[0], val0); + val1 = _mm_slli_si128(tmp0,5); //0,0,0,0,0,0,0,0,0,0,0, a1,a4,a7,b2,b5, + val1 = _mm_srli_si128(val1,11); //a1,a4,a7,b2,b5,0,0,0,0,0,0,0,0,0,0,0, + val2 = _mm_srli_si128(tmp1,2); //c0,c3,c6,c1,c4,c7,x,x,x,x,x,x,x,x,0,0 + val2 = _mm_slli_si128(val2,5); //0,0,0,0,0,c0,c3,c6,0,0,0,0,0,0,0,0 + val1 = _mm_or_si128(val1,val2); //a1,a4,a7,b2,b5,c0,c3,c6,x,x,x,x,x,x,x,x + _M64(v.val[1], val1); + + tmp0 = _mm_srli_si128(tmp0,11); //a2,a5,b0,b3,b6,0,0,0,0,0,0,0,0,0,0,0, + val2 = _mm_srli_si128(tmp1,5); //c1,c4,c7,0,0,0,0,0,0,0,0,0,0,0,0,0 + val2 = _mm_slli_si128(val2,5); //0,0,0,0,0,c1,c4,c7, + val2 = _mm_or_si128(tmp0, val2); //a2,a5,b0,b3,b6,c1,c4,c7,x,x,x,x,x,x,x,x + _M64(v.val[2], val2); + return v; +} + +uint16x4x3_t vld3_u16(__transfersize(12) uint16_t const * ptr); // VLD3.16 {d0, d1, d2}, [r0] +_NEON2SSE_INLINE uint16x4x3_t vld3_u16(__transfersize(12) uint16_t const * ptr) // VLD3.16 {d0, d1, d2}, [r0] +{ + //a0,a1,a2,a3, b0,b1,b2,b3, c0,c1,c2,c3 -> a0,a3,b2,c1, a1,b0,b3,c2, a2,b1,c0,c3, + uint16x4x3_t v; + __m128i val0, val1, val2, tmp0, tmp1; + _NEON2SSE_ALIGN_16 int8_t mask16[16] = {0,1, 6,7, 12,13, 2,3, 8,9, 14,15, 4,5, 10,11}; + val0 = vld1q_u16 (ptr); //a0,a1,a2,a3, b0,b1,b2,b3 + val2 = _mm_loadl_epi64((__m128i*)(ptr + 8)); //c0,c1,c2,c3, x,x,x,x + + tmp0 = _mm_shuffle_epi8(val0, *(__m128i*)mask16); //a0, a3, b2,a1, b0, b3, a2, b1 + tmp1 = _mm_shufflelo_epi16(val2, 201); //11 00 10 01 : c1, c2, c0, c3, + val0 = _mm_slli_si128(tmp0,10); + val0 = _mm_srli_si128(val0,10); //a0, a3, b2, 0,0, 0,0, + val2 = _mm_slli_si128(tmp1,14); //0,0,0,0,0,0,0,c1 + val2 = _mm_srli_si128(val2,8); //0,0,0,c1,0,0,0,0 + val0 = _mm_or_si128(val0,val2); //a0, a3, b2, c1, x,x,x,x + _M64(v.val[0], val0); + + val1 = _mm_slli_si128(tmp0,4); //0,0,0,0,0,a1, b0, b3 + val1 = _mm_srli_si128(val1,10); //a1, b0, b3, 0,0, 0,0, + val2 = _mm_srli_si128(tmp1,2); //c2, 0,0,0,0,0,0,0, + val2 = _mm_slli_si128(val2,6); //0,0,0,c2,0,0,0,0 + val1 = _mm_or_si128(val1,val2); //a1, b0, b3, c2, x,x,x,x + _M64(v.val[1], val1); + + tmp0 = _mm_srli_si128(tmp0,12); //a2, b1,0,0,0,0,0,0 + tmp1 = _mm_srli_si128(tmp1,4); + tmp1 = _mm_slli_si128(tmp1,4); //0,0,c0, c3, + val2 = _mm_or_si128(tmp0, tmp1); //a2, b1, c0, c3, + _M64(v.val[2], val2); + return v; +} + +uint32x2x3_t vld3_u32(__transfersize(6) uint32_t const * ptr); // VLD3.32 {d0, d1, d2}, [r0] +_NEON2SSE_INLINE uint32x2x3_t vld3_u32(__transfersize(6) uint32_t const * ptr) // VLD3.32 {d0, d1, d2}, [r0] +{ + //a0,a1, b0,b1, c0,c1, -> a0,b1, a1,c0, b0,c1 + uint32x2x3_t v; + __m128i val0, val1, val2; + val0 = vld1q_u32 (ptr); //a0,a1, b0,b1, + val2 = _mm_loadl_epi64((__m128i*) (ptr + 4)); //c0,c1, x,x + + val0 = _mm_shuffle_epi32(val0, 0 | (3 << 2) | (1 << 4) | (2 << 6)); //a0,b1, a1, b0 + _M64(v.val[0], val0); + val2 = _mm_slli_si128(val2, 8); //x, x,c0,c1, + val1 = _mm_unpackhi_epi32(val0,val2); //a1,c0, b0, c1 + _M64(v.val[1], val1); + val2 = _mm_srli_si128(val1, 8); //b0, c1, x, x, + _M64(v.val[2], val2); + return v; +} +uint64x1x3_t vld3_u64(__transfersize(3) uint64_t const * ptr); // VLD1.64 {d0, d1, d2}, [r0] +_NEON2SSE_INLINE uint64x1x3_t vld3_u64(__transfersize(3) uint64_t const * ptr) // VLD1.64 {d0, d1, d2}, [r0] +{ + uint64x1x3_t v; + v.val[0].m64_u64[0] = *(ptr); + v.val[1].m64_u64[0] = *(ptr + 1); + v.val[2].m64_u64[0] = *(ptr + 2); + return v; +} + +int8x8x3_t vld3_s8(__transfersize(24) int8_t const * ptr); // VLD3.8 {d0, d1, d2}, [r0] +#define vld3_s8(ptr) vld3_u8((uint8_t*)ptr) + +int16x4x3_t vld3_s16(__transfersize(12) int16_t const * ptr); // VLD3.16 {d0, d1, d2}, [r0] +#define vld3_s16(ptr) vld3_u16((uint16_t*)ptr) + +int32x2x3_t vld3_s32(__transfersize(6) int32_t const * ptr); // VLD3.32 {d0, d1, d2}, [r0] +#define vld3_s32(ptr) vld3_u32((uint32_t*)ptr) + +int64x1x3_t vld3_s64(__transfersize(3) int64_t const * ptr); // VLD1.64 {d0, d1, d2}, [r0] +#define vld3_s64(ptr) vld3_u64((uint64_t*)ptr) + +float16x4x3_t vld3_f16(__transfersize(12) __fp16 const * ptr); // VLD3.16 {d0, d1, d2}, [r0] +// IA32 SIMD doesn't work with 16bit floats currently, so need to go to 32 bit and then work with two 128bit registers. See vld1q_f16 for example + +float32x2x3_t vld3_f32(__transfersize(6) float32_t const * ptr); // VLD3.32 {d0, d1, d2}, [r0] +_NEON2SSE_INLINE float32x2x3_t vld3_f32(__transfersize(6) float32_t const * ptr) +{ + //a0,a1, b0,b1, c0,c1, -> a0,b1, a1,c0, b0,c1 + float32x2x3_t v; + v.val[0].m64_f32[0] = *(ptr); + v.val[0].m64_f32[1] = *(ptr + 3); + + v.val[1].m64_f32[0] = *(ptr + 1); + v.val[1].m64_f32[1] = *(ptr + 4); + + v.val[2].m64_f32[0] = *(ptr + 2); + v.val[2].m64_f32[1] = *(ptr + 5); + return v; +} + +poly8x8x3_t vld3_p8(__transfersize(24) poly8_t const * ptr); // VLD3.8 {d0, d1, d2}, [r0] +#define vld3_p8 vld3_u8 + +poly16x4x3_t vld3_p16(__transfersize(12) poly16_t const * ptr); // VLD3.16 {d0, d1, d2}, [r0] +#define vld3_p16 vld3_u16 + +//*************** Quadruples load ******************************** +//***************************************************************** +uint8x16x4_t vld4q_u8(__transfersize(64) uint8_t const * ptr); // VLD4.8 {d0, d2, d4, d6}, [r0] +_NEON2SSE_INLINE uint8x16x4_t vld4q_u8(__transfersize(64) uint8_t const * ptr) // VLD4.8 {d0, d2, d4, d6}, [r0] +{ + uint8x16x4_t v; + __m128i tmp3, tmp2, tmp1, tmp0; + + v.val[0] = vld1q_u8 ( ptr); //a0,a1,a2,...a7, ...a15 + v.val[1] = vld1q_u8 ( (ptr + 16)); //b0, b1,b2,...b7.... b15 + v.val[2] = vld1q_u8 ( (ptr + 32)); //c0, c1,c2,...c7....c15 + v.val[3] = vld1q_u8 ( (ptr + 48)); //d0,d1,d2,...d7....d15 + + tmp0 = _mm_unpacklo_epi8(v.val[0],v.val[1]); //a0,b0, a1,b1, a2,b2, a3,b3,....a7,b7 + tmp1 = _mm_unpacklo_epi8(v.val[2],v.val[3]); //c0,d0, c1,d1, c2,d2, c3,d3,... c7,d7 + tmp2 = _mm_unpackhi_epi8(v.val[0],v.val[1]); //a8,b8, a9,b9, a10,b10, a11,b11,...a15,b15 + tmp3 = _mm_unpackhi_epi8(v.val[2],v.val[3]); //c8,d8, c9,d9, c10,d10, c11,d11,...c15,d15 + + v.val[0] = _mm_unpacklo_epi8(tmp0, tmp2); //a0,a8, b0,b8, a1,a9, b1,b9, ....a3,a11, b3,b11 + v.val[1] = _mm_unpackhi_epi8(tmp0, tmp2); //a4,a12, b4,b12, a5,a13, b5,b13,....a7,a15,b7,b15 + v.val[2] = _mm_unpacklo_epi8(tmp1, tmp3); //c0,c8, d0,d8, c1,c9, d1,d9.....d3,d11 + v.val[3] = _mm_unpackhi_epi8(tmp1, tmp3); //c4,c12,d4,d12, c5,c13, d5,d13,....d7,d15 + + tmp0 = _mm_unpacklo_epi32(v.val[0], v.val[2] ); ///a0,a8, b0,b8, c0,c8, d0,d8, a1,a9, b1,b9, c1,c9, d1,d9 + tmp1 = _mm_unpackhi_epi32(v.val[0], v.val[2] ); //a2,a10, b2,b10, c2,c10, d2,d10, a3,a11, b3,b11, c3,c11, d3,d11 + tmp2 = _mm_unpacklo_epi32(v.val[1], v.val[3] ); //a4,a12, b4,b12, c4,c12, d4,d12, a5,a13, b5,b13, c5,c13, d5,d13, + tmp3 = _mm_unpackhi_epi32(v.val[1], v.val[3] ); //a6,a14, b6,b14, c6,c14, d6,d14, a7,a15,b7,b15,c7,c15,d7,d15 + + v.val[0] = _mm_unpacklo_epi8(tmp0, tmp2); //a0,a4,a8,a12,b0,b4,b8,b12,c0,c4,c8,c12,d0,d4,d8,d12 + v.val[1] = _mm_unpackhi_epi8(tmp0, tmp2); //a1,a5, a9, a13, b1,b5, b9,b13, c1,c5, c9, c13, d1,d5, d9,d13 + v.val[2] = _mm_unpacklo_epi8(tmp1, tmp3); //a2,a6, a10,a14, b2,b6, b10,b14,c2,c6, c10,c14, d2,d6, d10,d14 + v.val[3] = _mm_unpackhi_epi8(tmp1, tmp3); //a3,a7, a11,a15, b3,b7, b11,b15,c3,c7, c11, c15,d3,d7, d11,d15 + return v; +} + +uint16x8x4_t vld4q_u16(__transfersize(32) uint16_t const * ptr); // VLD4.16 {d0, d2, d4, d6}, [r0] +_NEON2SSE_INLINE uint16x8x4_t vld4q_u16(__transfersize(32) uint16_t const * ptr) // VLD4.16 {d0, d2, d4, d6}, [r0] +{ + uint16x8x4_t v; + __m128i tmp3, tmp2, tmp1, tmp0; + tmp0 = vld1q_u16 (ptr); //a0,a1,a2,...a7 + tmp1 = vld1q_u16 ((ptr + 8)); //b0, b1,b2,...b7 + tmp2 = vld1q_u16 ((ptr + 16)); //c0, c1,c2,...c7 + tmp3 = vld1q_u16 ((ptr + 24)); //d0,d1,d2,...d7 + v.val[0] = _mm_unpacklo_epi16(tmp0,tmp1); //a0,b0, a1,b1, a2,b2, a3,b3, + v.val[1] = _mm_unpacklo_epi16(tmp2,tmp3); //c0,d0, c1,d1, c2,d2, c3,d3, + v.val[2] = _mm_unpackhi_epi16(tmp0,tmp1); //a4,b4, a5,b5, a6,b6, a7,b7 + v.val[3] = _mm_unpackhi_epi16(tmp2,tmp3); //c4,d4, c5,d5, c6,d6, c7,d7 + tmp0 = _mm_unpacklo_epi16(v.val[0], v.val[2]); //a0,a4, b0,b4, a1,a5, b1,b5 + tmp1 = _mm_unpackhi_epi16(v.val[0], v.val[2]); //a2,a6, b2,b6, a3,a7, b3,b7 + tmp2 = _mm_unpacklo_epi16(v.val[1], v.val[3]); //c0,c4, d0,d4, c1,c5, d1,d5 + tmp3 = _mm_unpackhi_epi16(v.val[1], v.val[3]); //c2,c6, d2,d6, c3,c7, d3,d7 + v.val[0] = _mm_unpacklo_epi64(tmp0, tmp2); //a0,a4, b0,b4, c0,c4, d0,d4, + v.val[1] = _mm_unpackhi_epi64(tmp0, tmp2); //a1,a5, b1,b5, c1,c5, d1,d5 + v.val[2] = _mm_unpacklo_epi64(tmp1, tmp3); //a2,a6, b2,b6, c2,c6, d2,d6, + v.val[3] = _mm_unpackhi_epi64(tmp1, tmp3); //a3,a7, b3,b7, c3,c7, d3,d7 + return v; +} + +uint32x4x4_t vld4q_u32(__transfersize(16) uint32_t const * ptr); // VLD4.32 {d0, d2, d4, d6}, [r0] +_NEON2SSE_INLINE uint32x4x4_t vld4q_u32(__transfersize(16) uint32_t const * ptr) // VLD4.32 {d0, d2, d4, d6}, [r0] +{ + uint32x4x4_t v; + __m128i tmp3, tmp2, tmp1, tmp0; + v.val[0] = vld1q_u32 (ptr); + v.val[1] = vld1q_u32 ((ptr + 4)); + v.val[2] = vld1q_u32 ((ptr + 8)); + v.val[3] = vld1q_u32 ((ptr + 12)); + tmp0 = _mm_unpacklo_epi32(v.val[0],v.val[1]); + tmp1 = _mm_unpacklo_epi32(v.val[2],v.val[3]); + tmp2 = _mm_unpackhi_epi32(v.val[0],v.val[1]); + tmp3 = _mm_unpackhi_epi32(v.val[2],v.val[3]); + v.val[0] = _mm_unpacklo_epi64(tmp0, tmp1); + v.val[1] = _mm_unpackhi_epi64(tmp0, tmp1); + v.val[2] = _mm_unpacklo_epi64(tmp2, tmp3); + v.val[3] = _mm_unpackhi_epi64(tmp2, tmp3); + return v; +} + +int8x16x4_t vld4q_s8(__transfersize(64) int8_t const * ptr); // VLD4.8 {d0, d2, d4, d6}, [r0] +#define vld4q_s8(ptr) vld4q_u8((uint8_t*)ptr) + +int16x8x4_t vld4q_s16(__transfersize(32) int16_t const * ptr); // VLD4.16 {d0, d2, d4, d6}, [r0] +#define vld4q_s16(ptr) vld4q_u16((uint16_t*)ptr) + +int32x4x4_t vld4q_s32(__transfersize(16) int32_t const * ptr); // VLD4.32 {d0, d2, d4, d6}, [r0] +#define vld4q_s32(ptr) vld4q_u32((uint32_t*)ptr) + +float16x8x4_t vld4q_f16(__transfersize(32) __fp16 const * ptr); // VLD4.16 {d0, d2, d4, d6}, [r0] +// IA32 SIMD doesn't work with 16bit floats currently, so need to go to 32 bit and then work with two 128bit registers. See vld1q_f16 for example + +float32x4x4_t vld4q_f32(__transfersize(16) float32_t const * ptr); // VLD4.32 {d0, d2, d4, d6}, [r0] +_NEON2SSE_INLINE float32x4x4_t vld4q_f32(__transfersize(16) float32_t const * ptr) // VLD4.32 {d0, d2, d4, d6}, [r0] +{ + float32x4x4_t v; + __m128 tmp3, tmp2, tmp1, tmp0; + + v.val[0] = vld1q_f32 ((float*) ptr); + v.val[1] = vld1q_f32 ((float*) (ptr + 4)); + v.val[2] = vld1q_f32 ((float*) (ptr + 8)); + v.val[3] = vld1q_f32 ((float*) (ptr + 12)); + tmp0 = _mm_unpacklo_ps(v.val[0], v.val[1]); + tmp2 = _mm_unpacklo_ps(v.val[2], v.val[3]); + tmp1 = _mm_unpackhi_ps(v.val[0], v.val[1]); + tmp3 = _mm_unpackhi_ps(v.val[2], v.val[3]); + v.val[0] = _mm_movelh_ps(tmp0, tmp2); + v.val[1] = _mm_movehl_ps(tmp2, tmp0); + v.val[2] = _mm_movelh_ps(tmp1, tmp3); + v.val[3] = _mm_movehl_ps(tmp3, tmp1); + return v; +} + +poly8x16x4_t vld4q_p8(__transfersize(64) poly8_t const * ptr); // VLD4.8 {d0, d2, d4, d6}, [r0] +#define vld4q_p8 vld4q_u8 + +poly16x8x4_t vld4q_p16(__transfersize(32) poly16_t const * ptr); // VLD4.16 {d0, d2, d4, d6}, [r0] +#define vld4q_p16 vld4q_s16 + +uint8x8x4_t vld4_u8(__transfersize(32) uint8_t const * ptr); // VLD4.8 {d0, d1, d2, d3}, [r0] +_NEON2SSE_INLINE uint8x8x4_t vld4_u8(__transfersize(32) uint8_t const * ptr) // VLD4.8 {d0, d1, d2, d3}, [r0] +{ + uint8x8x4_t v; + __m128i sh0, sh1; + __m128i val0, val2; + _NEON2SSE_ALIGN_16 int8_t mask4_8[16] = {0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11, 15}; + + val0 = vld1q_u8(( ptr)); //load first 64-bits in val[0] and val[1] + val2 = vld1q_u8(( ptr + 16)); //load third and forth 64-bits in val[2], val[3] + + sh0 = _mm_shuffle_epi8(val0, *(__m128i*)mask4_8); + sh1 = _mm_shuffle_epi8(val2, *(__m128i*)mask4_8); + val0 = _mm_unpacklo_epi32(sh0,sh1); //0,4,8,12,16,20,24,28, 1,5,9,13,17,21,25,29 + vst1q_u8(&v.val[0], val0 ); + val2 = _mm_unpackhi_epi32(sh0,sh1); //2,6,10,14,18,22,26,30, 3,7,11,15,19,23,27,31 + vst1q_u8(&v.val[2], val2 ); + return v; +} + +uint16x4x4_t vld4_u16(__transfersize(16) uint16_t const * ptr); // VLD4.16 {d0, d1, d2, d3}, [r0] +_NEON2SSE_INLINE uint16x4x4_t vld4_u16(__transfersize(16) uint16_t const * ptr) // VLD4.16 {d0, d1, d2, d3}, [r0] +{ + uint16x4x4_t v; + __m128i sh0, sh1; + __m128i val0, val2; + _NEON2SSE_ALIGN_16 int8_t mask4_16[16] = {0,1, 8,9, 2,3, 10,11, 4,5, 12,13, 6,7, 14,15}; //0, 4, 1, 5, 2, 6, 3, 7 + val0 = vld1q_u16 ( (ptr)); //load first 64-bits in val[0] and val[1] + val2 = vld1q_u16 ( (ptr + 8)); //load third and forth 64-bits in val[2], val[3] + sh0 = _mm_shuffle_epi8(val0, *(__m128i*)mask4_16); + sh1 = _mm_shuffle_epi8(val2, *(__m128i*)mask4_16); + val0 = _mm_unpacklo_epi32(sh0,sh1); //0,4,8,12, 1,5,9,13 + vst1q_u16(&v.val[0], val0 ); + val2 = _mm_unpackhi_epi32(sh0,sh1); //2,6,10,14, 3,7,11,15 + vst1q_u16(&v.val[2], val2 ); + return v; +} + +uint32x2x4_t vld4_u32(__transfersize(8) uint32_t const * ptr); // VLD4.32 {d0, d1, d2, d3}, [r0] +_NEON2SSE_INLINE uint32x2x4_t vld4_u32(__transfersize(8) uint32_t const * ptr) +{ + //a0,a1, b0,b1, c0,c1, d0,d1 -> a0,c0, a1,c1, b0,d0, b1,d1 + uint32x2x4_t v; + __m128i val0, val01, val2; + val0 = vld1q_u32 (ptr); //a0,a1, b0,b1, + val2 = vld1q_u32 ((ptr + 4)); //c0,c1, d0,d1 + val01 = _mm_unpacklo_epi32(val0,val2); //a0, c0, a1,c1, + val2 = _mm_unpackhi_epi32(val0,val2); //b0,d0, b1, d1 + vst1q_u32(&v.val[0], val01); + vst1q_u32(&v.val[2], val2 ); + return v; +} + +uint64x1x4_t vld4_u64(__transfersize(4) uint64_t const * ptr); // VLD1.64 {d0, d1, d2, d3}, [r0] +_NEON2SSE_INLINE uint64x1x4_t vld4_u64(__transfersize(4) uint64_t const * ptr) // VLD1.64 {d0, d1, d2, d3}, [r0] +{ + uint64x1x4_t v; + v.val[0].m64_u64[0] = *(ptr); //load first 64-bits in val[0] and val[1] + v.val[1].m64_u64[0] = *(ptr + 1); //load first 64-bits in val[0] and val[1] + v.val[2].m64_u64[0] = *(ptr + 2); //load third and forth 64-bits in val[2], val[3] + v.val[3].m64_u64[0] = *(ptr + 3); //load third and forth 64-bits in val[2], val[3] + return v; +} + +int8x8x4_t vld4_s8(__transfersize(32) int8_t const * ptr); // VLD4.8 {d0, d1, d2, d3}, [r0] +#define vld4_s8(ptr) vld4_u8((uint8_t*)ptr) + +int16x4x4_t vld4_s16(__transfersize(16) int16_t const * ptr); // VLD4.16 {d0, d1, d2, d3}, [r0] +#define vld4_s16(ptr) vld4_u16((uint16_t*)ptr) + +int32x2x4_t vld4_s32(__transfersize(8) int32_t const * ptr); // VLD4.32 {d0, d1, d2, d3}, [r0] +#define vld4_s32(ptr) vld4_u32((uint32_t*)ptr) + +int64x1x4_t vld4_s64(__transfersize(4) int64_t const * ptr); // VLD1.64 {d0, d1, d2, d3}, [r0] +#define vld4_s64(ptr) vld4_u64((uint64_t*)ptr) + +float16x4x4_t vld4_f16(__transfersize(16) __fp16 const * ptr); // VLD4.16 {d0, d1, d2, d3}, [r0] +// IA32 SIMD doesn't work with 16bit floats currently, so need to go to 32 bit and then work with two 128bit registers. See vld1q_f16 for example + +float32x2x4_t vld4_f32(__transfersize(8) float32_t const * ptr); // VLD4.32 {d0, d1, d2, d3}, [r0] +_NEON2SSE_INLINE float32x2x4_t vld4_f32(__transfersize(8) float32_t const * ptr) // VLD4.32 {d0, d1, d2, d3}, [r0] +{ + //a0,a1, b0,b1, c0,c1, d0,d1 -> a0,c0, a1,c1, b0,d0, b1,d1 + float32x2x4_t res; + res.val[0].m64_f32[0] = *(ptr); + res.val[0].m64_f32[1] = *(ptr + 4); + res.val[1].m64_f32[0] = *(ptr + 1); + res.val[1].m64_f32[1] = *(ptr + 5); + res.val[2].m64_f32[0] = *(ptr + 2); + res.val[2].m64_f32[1] = *(ptr + 6); + res.val[3].m64_f32[0] = *(ptr + 3); + res.val[3].m64_f32[1] = *(ptr + 7); + return res; +} + +poly8x8x4_t vld4_p8(__transfersize(32) poly8_t const * ptr); // VLD4.8 {d0, d1, d2, d3}, [r0] +#define vld4_p8 vld4_u8 + +poly16x4x4_t vld4_p16(__transfersize(16) poly16_t const * ptr); // VLD4.16 {d0, d1, d2, d3}, [r0] +#define vld4_p16 vld4_u16 + +//************* Duplicate (or propagate) ptr[0] to all val[0] lanes and ptr[1] to all val[1] lanes ******************* +//******************************************************************************************************************* +uint8x8x2_t vld2_dup_u8(__transfersize(2) uint8_t const * ptr); // VLD2.8 {d0[], d1[]}, [r0] +_NEON2SSE_INLINE uint8x8x2_t vld2_dup_u8(__transfersize(2) uint8_t const * ptr) // VLD2.8 {d0[], d1[]}, [r0] +{ + uint8x8x2_t v; + __m128i val0, val1; + val0 = LOAD_SI128(ptr); //0,1,x,x, x,x,x,x,x,x,x,x, x,x,x,x + val1 = _mm_unpacklo_epi8(val0,val0); //0,0,1,1,x,x,x,x, x,x,x,x,x,x,x,x, + val1 = _mm_unpacklo_epi16(val1,val1); //0,0,0,0, 1,1,1,1,x,x,x,x, x,x,x,x + val0 = _mm_unpacklo_epi32(val1,val1); //0,0,0,0, 0,0,0,0,1,1,1,1,1,1,1,1, + vst1q_u8(v.val, val0); + return v; +} + +uint16x4x2_t vld2_dup_u16(__transfersize(2) uint16_t const * ptr); // VLD2.16 {d0[], d1[]}, [r0] +_NEON2SSE_INLINE uint16x4x2_t vld2_dup_u16(__transfersize(2) uint16_t const * ptr) // VLD2.16 {d0[], d1[]}, [r0] +{ + uint16x4x2_t v; + __m128i val0, val1; + val1 = LOAD_SI128(ptr); //0,1,x,x, x,x,x,x + val0 = _mm_shufflelo_epi16(val1, 0); //00 00 00 00 (all 0) + _M64(v.val[0], val0); + val1 = _mm_shufflelo_epi16(val1, 85); //01 01 01 01 (all 1) + _M64(v.val[1], val1); + return v; +} + +uint32x2x2_t vld2_dup_u32(__transfersize(2) uint32_t const * ptr); // VLD2.32 {d0[], d1[]}, [r0] +_NEON2SSE_INLINE uint32x2x2_t vld2_dup_u32(__transfersize(2) uint32_t const * ptr) // VLD2.32 {d0[], d1[]}, [r0] +{ + uint32x2x2_t v; + __m128i val0; + val0 = LOAD_SI128(ptr); //0,1,x,x + val0 = _mm_shuffle_epi32(val0, 0 | (0 << 2) | (1 << 4) | (1 << 6)); //0,0,1,1 + vst1q_u32(v.val, val0); + return v; +} + +uint64x1x2_t vld2_dup_u64(__transfersize(2) uint64_t const * ptr); // VLD1.64 {d0, d1}, [r0] +#define vld2_dup_u64 vld2_u64 + +int8x8x2_t vld2_dup_s8(__transfersize(2) int8_t const * ptr); // VLD2.8 {d0[], d1[]}, [r0] +#define vld2_dup_s8(ptr) vld2_dup_u8((uint8_t*)ptr) + +int16x4x2_t vld2_dup_s16(__transfersize(2) int16_t const * ptr); // VLD2.16 {d0[], d1[]}, [r0] +#define vld2_dup_s16(ptr) vld2_dup_u16((uint16_t*)ptr) + +int32x2x2_t vld2_dup_s32(__transfersize(2) int32_t const * ptr); // VLD2.32 {d0[], d1[]}, [r0] +#define vld2_dup_s32(ptr) vld2_dup_u32((uint32_t*)ptr) + +int64x1x2_t vld2_dup_s64(__transfersize(2) int64_t const * ptr); // VLD1.64 {d0, d1}, [r0] +#define vld2_dup_s64(ptr) vld2_dup_u64((uint64_t*)ptr) + +float16x4x2_t vld2_dup_f16(__transfersize(2) __fp16 const * ptr); // VLD2.16 {d0[], d1[]}, [r0] +// IA32 SIMD doesn't work with 16bit floats currently, so need to go to 32 bit and then work with two 128bit registers. See vld1q_f16 for example + +float32x2x2_t vld2_dup_f32(__transfersize(2) float32_t const * ptr); // VLD2.32 {d0[], d1[]}, [r0] +_NEON2SSE_INLINE float32x2x2_t vld2_dup_f32(__transfersize(2) float32_t const * ptr) // VLD2.32 {d0[], d1[]}, [r0] +{ + float32x2x2_t v; + v.val[0].m64_f32[0] = *(ptr); //0,0 + v.val[0].m64_f32[1] = *(ptr); //0,0 + v.val[1].m64_f32[0] = *(ptr + 1); //1,1 + v.val[1].m64_f32[1] = *(ptr + 1); //1,1 + return v; +} + +poly8x8x2_t vld2_dup_p8(__transfersize(2) poly8_t const * ptr); // VLD2.8 {d0[], d1[]}, [r0] +#define vld2_dup_p8 vld2_dup_u8 + +poly16x4x2_t vld2_dup_p16(__transfersize(2) poly16_t const * ptr); // VLD2.16 {d0[], d1[]}, [r0] +#define vld2_dup_p16 vld2_dup_s16 + +//************* Duplicate (or propagate)triplets: ******************* +//******************************************************************** +//ptr[0] to all val[0] lanes, ptr[1] to all val[1] lanes and ptr[2] to all val[2] lanes +uint8x8x3_t vld3_dup_u8(__transfersize(3) uint8_t const * ptr); // VLD3.8 {d0[], d1[], d2[]}, [r0] +_NEON2SSE_INLINE uint8x8x3_t vld3_dup_u8(__transfersize(3) uint8_t const * ptr) // VLD3.8 {d0[], d1[], d2[]}, [r0] +{ + uint8x8x3_t v; + __m128i val0, val1, val2; + val0 = LOAD_SI128(ptr); //0,1,2,x, x,x,x,x,x,x,x,x, x,x,x,x + val1 = _mm_unpacklo_epi8(val0,val0); //0,0,1,1,2,2,x,x, x,x,x,x,x,x,x,x, + val1 = _mm_unpacklo_epi16(val1,val1); //0,0,0,0, 1,1,1,1,2,2,2,2,x,x,x,x, + val0 = _mm_unpacklo_epi32(val1,val1); //0,0,0,0, 0,0,0,0,1,1,1,1,1,1,1,1, + val2 = _mm_unpackhi_epi32(val1,val1); // 2,2,2,2,2,2,2,2, x,x,x,x,x,x,x,x, + vst1q_u8(v.val, val0); + _M64(v.val[2], val2); + return v; +} + +uint16x4x3_t vld3_dup_u16(__transfersize(3) uint16_t const * ptr); // VLD3.16 {d0[], d1[], d2[]}, [r0] +_NEON2SSE_INLINE uint16x4x3_t vld3_dup_u16(__transfersize(3) uint16_t const * ptr) // VLD3.16 {d0[], d1[], d2[]}, [r0] +{ + uint16x4x3_t v; + __m128i val0, val1, val2; + val2 = LOAD_SI128(ptr); //0,1,2,x, x,x,x,x + val0 = _mm_shufflelo_epi16(val2, 0); //00 00 00 00 (all 0) + val1 = _mm_shufflelo_epi16(val2, 85); //01 01 01 01 (all 1) + val2 = _mm_shufflelo_epi16(val2, 170); //10 10 10 10 (all 2) + _M64(v.val[0], val0); + _M64(v.val[1], val1); + _M64(v.val[2], val2); + return v; +} + +uint32x2x3_t vld3_dup_u32(__transfersize(3) uint32_t const * ptr); // VLD3.32 {d0[], d1[], d2[]}, [r0] +_NEON2SSE_INLINE uint32x2x3_t vld3_dup_u32(__transfersize(3) uint32_t const * ptr) // VLD3.32 {d0[], d1[], d2[]}, [r0] +{ + uint32x2x3_t v; + __m128i val0, val1, val2; + val2 = LOAD_SI128(ptr); //0,1,2,x + val0 = _mm_shuffle_epi32(val2, 0 | (0 << 2) | (2 << 4) | (2 << 6)); //0,0,2,2 + val1 = _mm_shuffle_epi32(val2, 1 | (1 << 2) | (2 << 4) | (2 << 6)); //1,1,2,2 + val2 = _mm_srli_si128(val0, 8); //2,2,0x0,0x0 + _M64(v.val[0], val0); + _M64(v.val[1], val1); + _M64(v.val[2], val2); + return v; +} + +uint64x1x3_t vld3_dup_u64(__transfersize(3) uint64_t const * ptr); // VLD1.64 {d0, d1, d2}, [r0] +_NEON2SSE_INLINE uint64x1x3_t vld3_dup_u64(__transfersize(3) uint64_t const * ptr) // VLD1.64 {d0, d1, d2}, [r0] +{ + uint64x1x3_t v; + v.val[0].m64_u64[0] = *(ptr); + v.val[1].m64_u64[0] = *(ptr + 1); + v.val[2].m64_u64[0] = *(ptr + 2); + return v; +} + +int8x8x3_t vld3_dup_s8(__transfersize(3) int8_t const * ptr); // VLD3.8 {d0[], d1[], d2[]}, [r0] +#define vld3_dup_s8(ptr) vld3_dup_u8((uint8_t*)ptr) + +int16x4x3_t vld3_dup_s16(__transfersize(3) int16_t const * ptr); // VLD3.16 {d0[], d1[], d2[]}, [r0] +#define vld3_dup_s16(ptr) vld3_dup_u16((uint16_t*)ptr) + +int32x2x3_t vld3_dup_s32(__transfersize(3) int32_t const * ptr); // VLD3.32 {d0[], d1[], d2[]}, [r0] +#define vld3_dup_s32(ptr) vld3_dup_u32((uint32_t*)ptr) + +int64x1x3_t vld3_dup_s64(__transfersize(3) int64_t const * ptr); // VLD1.64 {d0, d1, d2}, [r0] +#define vld3_dup_s64(ptr) vld3_dup_u64((uint64_t*)ptr) + + +float16x4x3_t vld3_dup_f16(__transfersize(3) __fp16 const * ptr); // VLD3.16 {d0[], d1[], d2[]}, [r0] +// IA32 SIMD doesn't work with 16bit floats currently, so need to go to 32 bit and then work with two 128bit registers. See vld1q_f16 for example + +float32x2x3_t vld3_dup_f32(__transfersize(3) float32_t const * ptr); // VLD3.32 {d0[], d1[], d2[]}, [r0] +_NEON2SSE_INLINE float32x2x3_t vld3_dup_f32(__transfersize(3) float32_t const * ptr) // VLD3.32 {d0[], d1[], d2[]}, [r0] +{ + float32x2x3_t v; + int i; + for (i = 0; i<3; i++) { + v.val[i].m64_f32[0] = *(ptr + i); + v.val[i].m64_f32[1] = *(ptr + i); + } + return v; +} + +poly8x8x3_t vld3_dup_p8(__transfersize(3) poly8_t const * ptr); // VLD3.8 {d0[], d1[], d2[]}, [r0] +#define vld3_dup_p8 vld3_dup_u8 + +poly16x4x3_t vld3_dup_p16(__transfersize(3) poly16_t const * ptr); // VLD3.16 {d0[], d1[], d2[]}, [r0] +#define vld3_dup_p16 vld3_dup_s16 + + +//************* Duplicate (or propagate) quadruples: ******************* +//*********************************************************************** +//ptr[0] to all val[0] lanes, ptr[1] to all val[1] lanes, ptr[2] to all val[2] lanes and ptr[3] to all val[3] lanes +uint8x8x4_t vld4_dup_u8(__transfersize(4) uint8_t const * ptr); // VLD4.8 {d0[], d1[], d2[], d3[]}, [r0] +_NEON2SSE_INLINE uint8x8x4_t vld4_dup_u8(__transfersize(4) uint8_t const * ptr) // VLD4.8 {d0[], d1[], d2[], d3[]}, [r0] +{ + uint8x8x4_t v; + __m128i val0, val1, val2; + val0 = LOAD_SI128(ptr); //0,1,2,3, x,x,x,x,x,x,x,x, x,x,x,x + val1 = _mm_unpacklo_epi8(val0,val0); //0,0,1,1,2,2,3,3, x,x,x,x,x,x,x,x, + val1 = _mm_unpacklo_epi16(val1,val1); //0,0,0,0, 1,1,1,1,2,2,2,2,3,3,3,3 + val0 = _mm_unpacklo_epi32(val1,val1); //0,0,0,0, 0,0,0,0,1,1,1,1,1,1,1,1, + val2 = _mm_unpackhi_epi32(val1,val1); // 2,2,2,2,2,2,2,2, 3,3,3,3, 3,3,3,3 + vst1q_u8(&v.val[0], val0); + vst1q_u8(&v.val[2], val2); + return v; +} + +uint16x4x4_t vld4_dup_u16(__transfersize(4) uint16_t const * ptr); // VLD4.16 {d0[], d1[], d2[], d3[]}, [r0] +_NEON2SSE_INLINE uint16x4x4_t vld4_dup_u16(__transfersize(4) uint16_t const * ptr) // VLD4.16 {d0[], d1[], d2[], d3[]}, [r0] +{ + uint16x4x4_t v; + __m128i val0, val1, val2, val3; + val3 = LOAD_SI128(ptr); //0,1,2,3, x,x,x,x + val0 = _mm_shufflelo_epi16(val3, 0); //00 00 00 00 (all 0) + val1 = _mm_shufflelo_epi16(val3, 85); //01 01 01 01 (all 1) + val2 = _mm_shufflelo_epi16(val3, 170); //10 10 10 10 (all 2) + val3 = _mm_shufflelo_epi16(val3, 255); //11 11 11 11 (all 3) + _M64(v.val[0], val0); + _M64(v.val[1], val1); + _M64(v.val[2], val2); + _M64(v.val[3], val3); + return v; +} + +uint32x2x4_t vld4_dup_u32(__transfersize(4) uint32_t const * ptr); // VLD4.32 {d0[], d1[], d2[], d3[]}, [r0] +_NEON2SSE_INLINE uint32x2x4_t vld4_dup_u32(__transfersize(4) uint32_t const * ptr) // VLD4.32 {d0[], d1[], d2[], d3[]}, [r0] +{ + uint32x2x4_t v; + __m128i val0, val1, val2, val3; + val3 = LOAD_SI128(ptr); //0,1,2,3 + val0 = _mm_shuffle_epi32(val3, 0 | (0 << 2) | (2 << 4) | (3 << 6)); //0,0,2,3 + val1 = _mm_shuffle_epi32(val3, 1 | (1 << 2) | (2 << 4) | (3 << 6)); //1,1,2,3 + val2 = _mm_shuffle_epi32(val3, 2 | (2 << 2) | (3 << 4) | (3 << 6)); //2,2,3,3 + val3 = _mm_shuffle_epi32(val3, 3 | (3 << 2) | (3 << 4) | (3 << 6)); //3,3,2,2 + _M64(v.val[0], val0); + _M64(v.val[1], val1); + _M64(v.val[2], val2); + _M64(v.val[3], val3); + return v; +} + +uint64x1x4_t vld4_dup_u64(__transfersize(4) uint64_t const * ptr); // VLD1.64 {d0, d1, d2, d3}, [r0] +_NEON2SSE_INLINE uint64x1x4_t vld4_dup_u64(__transfersize(4) uint64_t const * ptr) // VLD1.64 {d0, d1, d2, d3}, [r0] +{ + uint64x1x4_t v; + v.val[0].m64_u64[0] = *(ptr); + v.val[1].m64_u64[0] = *(ptr + 1); + v.val[2].m64_u64[0] = *(ptr + 2); + v.val[3].m64_u64[0] = *(ptr + 3); + return v; +} + +int8x8x4_t vld4_dup_s8(__transfersize(4) int8_t const * ptr); // VLD4.8 {d0[], d1[], d2[], d3[]}, [r0] +#define vld4_dup_s8(ptr) vld4_dup_u8((uint8_t*)ptr) + +int16x4x4_t vld4_dup_s16(__transfersize(4) int16_t const * ptr); // VLD4.16 {d0[], d1[], d2[], d3[]}, [r0] +#define vld4_dup_s16(ptr) vld4_dup_u16((uint16_t*)ptr) + +int32x2x4_t vld4_dup_s32(__transfersize(4) int32_t const * ptr); // VLD4.32 {d0[], d1[], d2[], d3[]}, [r0] +#define vld4_dup_s32(ptr) vld4_dup_u32((uint32_t*)ptr) + +int64x1x4_t vld4_dup_s64(__transfersize(4) int64_t const * ptr); // VLD1.64 {d0, d1, d2, d3}, [r0] +#define vld4_dup_s64(ptr) vld4_dup_u64((uint64_t*)ptr) + +float16x4x4_t vld4_dup_f16(__transfersize(4) __fp16 const * ptr); // VLD4.16 {d0[], d1[], d2[], d3[]}, [r0] +// IA32 SIMD doesn't work with 16bit floats currently, so need to go to 32 bit and then work with two 128bit registers. See vld1q_f16 for example + +float32x2x4_t vld4_dup_f32(__transfersize(4) float32_t const * ptr); // VLD4.32 {d0[], d1[], d2[], d3[]}, [r0] +_NEON2SSE_INLINE float32x2x4_t vld4_dup_f32(__transfersize(4) float32_t const * ptr) // VLD4.32 {d0[], d1[], d2[], d3[]}, [r0] +{ + float32x2x4_t v; + int i; + for (i = 0; i<4; i++) { + v.val[i].m64_f32[0] = *(ptr + i); + v.val[i].m64_f32[1] = *(ptr + i); + } + return v; +} + +poly8x8x4_t vld4_dup_p8(__transfersize(4) poly8_t const * ptr); // VLD4.8 {d0[], d1[], d2[], d3[]}, [r0] +#define vld4_dup_p8 vld4_dup_u8 + +poly16x4x4_t vld4_dup_p16(__transfersize(4) poly16_t const * ptr); // VLD4.16 {d0[], d1[], d2[], d3[]}, [r0] +#define vld4_dup_p16 vld4_dup_u16 + + +//********************************************************************************** +//*******************Lane loads for an N-element structures *********************** +//********************************************************************************** +//********************** Lane pairs ************************************************ +//does vld1_lane_xx ptr[0] to src->val[0] at lane positon and ptr[1] to src->val[1] at lane positon +//we assume src is 16 bit aligned + +//!!!!!! Microsoft compiler does not allow xxxxxx_2t function arguments resulting in "formal parameter with __declspec(align('16')) won't be aligned" error +//to fix it the all functions below work with xxxxxx_2t pointers and the corresponding original functions are redefined + +//uint16x8x2_t vld2q_lane_u16(__transfersize(2) uint16_t const * ptr, uint16x8x2_t src,__constrange(0,7) int lane);// VLD2.16 {d0[0], d2[0]}, [r0] +_NEON2SSE_INLINE uint16x8x2_t vld2q_lane_u16_ptr(__transfersize(2) uint16_t const * ptr, uint16x8x2_t* src,__constrange(0,7) int lane) // VLD2.16 {d0[0], d2[0]}, [r0] +{ + uint16x8x2_t v; + v.val[0] = vld1q_lane_s16 (ptr, src->val[0], lane); + v.val[1] = vld1q_lane_s16 ((ptr + 1), src->val[1], lane); + return v; +} +#define vld2q_lane_u16(ptr, src, lane) vld2q_lane_u16_ptr(ptr, &src, lane) + +//uint32x4x2_t vld2q_lane_u32(__transfersize(2) uint32_t const * ptr, uint32x4x2_t src,__constrange(0,3) int lane);// VLD2.32 {d0[0], d2[0]}, [r0] +_NEON2SSE_INLINE uint32x4x2_t vld2q_lane_u32_ptr(__transfersize(2) uint32_t const * ptr, uint32x4x2_t* src,__constrange(0,3) int lane) // VLD2.32 {d0[0], d2[0]}, [r0] +{ + uint32x4x2_t v; + v.val[0] = _MM_INSERT_EPI32 (src->val[0], ptr[0], lane); + v.val[1] = _MM_INSERT_EPI32 (src->val[1], ptr[1], lane); + return v; +} +#define vld2q_lane_u32(ptr, src, lane) vld2q_lane_u32_ptr(ptr, &src, lane) + +//int16x8x2_t vld2q_lane_s16(__transfersize(2) int16_t const * ptr, int16x8x2_t src, __constrange(0,7)int lane);// VLD2.16 {d0[0], d2[0]}, [r0] +_NEON2SSE_INLINE int16x8x2_t vld2q_lane_s16_ptr(__transfersize(2) int16_t const * ptr, int16x8x2_t* src, __constrange(0,7) int lane) +{ + int16x8x2_t v; + v.val[0] = vld1q_lane_s16 (ptr, src->val[0], lane); + v.val[1] = vld1q_lane_s16 ((ptr + 1), src->val[1], lane); + return v; +} +#define vld2q_lane_s16(ptr, src, lane) vld2q_lane_s16_ptr(ptr, &src, lane) + +//int32x4x2_t vld2q_lane_s32(__transfersize(2) int32_t const * ptr, int32x4x2_t src, __constrange(0,3)int lane);// VLD2.32 {d0[0], d2[0]}, [r0] +_NEON2SSE_INLINE int32x4x2_t vld2q_lane_s32_ptr(__transfersize(2) int32_t const * ptr, int32x4x2_t* src, __constrange(0,3) int lane) +{ + int32x4x2_t v; + v.val[0] = _MM_INSERT_EPI32 (src->val[0], ptr[0], lane); + v.val[1] = _MM_INSERT_EPI32 (src->val[1], ptr[1], lane); + return v; +} +#define vld2q_lane_s32(ptr, src, lane) vld2q_lane_s32_ptr(ptr, &src, lane) + +//float16x8x2_t vld2q_lane_f16(__transfersize(2) __fp16 const * ptr, float16x8x2_t src, __constrange(0,7)int lane);// VLD2.16 {d0[0], d2[0]}, [r0] +//current IA SIMD doesn't support float16 + +//float32x4x2_t vld2q_lane_f32_ptr(__transfersize(2) float32_t const * ptr, float32x4x2_t src,__constrange(0,3) int lane);// VLD2.32 {d0[0], d2[0]}, [r0] +_NEON2SSE_INLINE float32x4x2_t vld2q_lane_f32_ptr(__transfersize(2) float32_t const * ptr, float32x4x2_t* src,__constrange(0,3) int lane) // VLD2.32 {d0[0], d2[0]}, [r0] +{ + float32x4x2_t v; + v.val[0] = vld1q_lane_f32(ptr, src->val[0], lane); + v.val[1] = vld1q_lane_f32((ptr + 1), src->val[1], lane); + return v; +} +#define vld2q_lane_f32(ptr,src,lane) vld2q_lane_f32_ptr(ptr,&src,lane) + +//poly16x8x2_t vld2q_lane_p16(__transfersize(2) poly16_t const * ptr, poly16x8x2_t src,__constrange(0,7) int lane);// VLD2.16 {d0[0], d2[0]}, [r0] +#define vld2q_lane_p16 vld2q_lane_u16 + +//uint8x8x2_t vld2_lane_u8(__transfersize(2) uint8_t const * ptr, uint8x8x2_t src, __constrange(0,7) int lane);// VLD2.8 {d0[0], d1[0]}, [r0] +_NEON2SSE_INLINE uint8x8x2_t vld2_lane_u8_ptr(__transfersize(2) uint8_t const * ptr, uint8x8x2_t* src, __constrange(0,7) int lane) // VLD2.8 {d0[0], d1[0]}, [r0] +{ + uint8x8x2_t v; + v.val[0] = vld1_lane_u8(ptr, src->val[0], lane); + v.val[1] = vld1_lane_u8((ptr + 1), src->val[1], lane); + return v; +} +#define vld2_lane_u8(ptr, src, lane) vld2_lane_u8_ptr(ptr, &src, lane) + +//uint16x4x2_t vld2_lane_u16(__transfersize(2) uint16_t const * ptr, uint16x4x2_t src, __constrange(0,3)int lane);// VLD2.16 {d0[0], d1[0]}, [r0] +_NEON2SSE_INLINE uint16x4x2_t vld2_lane_u16_ptr(__transfersize(2) uint16_t const * ptr, uint16x4x2_t* src, __constrange(0,3) int lane) +{ + uint16x4x2_t v; + v.val[0] = vld1_lane_u16(ptr, src->val[0], lane); + v.val[1] = vld1_lane_u16((ptr + 1), src->val[1], lane); + return v; +} +#define vld2_lane_u16(ptr, src, lane) vld2_lane_u16_ptr(ptr, &src, lane) + +//uint32x2x2_t vld2_lane_u32(__transfersize(2) uint32_t const * ptr, uint32x2x2_t src, __constrange(0,1)int lane);// VLD2.32 {d0[0], d1[0]}, [r0] +_NEON2SSE_INLINE uint32x2x2_t vld2_lane_u32_ptr(__transfersize(2) uint32_t const * ptr, uint32x2x2_t* src, __constrange(0,1) int lane) +{ + uint32x2x2_t v; + v.val[0] = vld1_lane_u32(ptr, src->val[0], lane); + v.val[1] = vld1_lane_u32((ptr + 1), src->val[1], lane); + return v; +} +#define vld2_lane_u32(ptr, src, lane) vld2_lane_u32_ptr(ptr, &src, lane) + +//int8x8x2_t vld2_lane_s8(__transfersize(2) int8_t const * ptr, int8x8x2_t src, __constrange(0,7) int lane);// VLD2.8 {d0[0], d1[0]}, [r0] +int8x8x2_t vld2_lane_s8_ptr(__transfersize(2) int8_t const * ptr, int8x8x2_t * src, __constrange(0,7) int lane); // VLD2.8 {d0[0], d1[0]}, [r0] +#define vld2_lane_s8(ptr, src, lane) vld2_lane_u8(( uint8_t*) ptr, src, lane) + +//int16x4x2_t vld2_lane_s16(__transfersize(2) int16_t const * ptr, int16x4x2_t src, __constrange(0,3) int lane);// VLD2.16 {d0[0], d1[0]}, [r0] +int16x4x2_t vld2_lane_s16_ptr(__transfersize(2) int16_t const * ptr, int16x4x2_t * src, __constrange(0,3) int lane); // VLD2.16 {d0[0], d1[0]}, [r0] +#define vld2_lane_s16(ptr, src, lane) vld2_lane_u16(( uint16_t*) ptr, src, lane) + +//int32x2x2_t vld2_lane_s32(__transfersize(2) int32_t const * ptr, int32x2x2_t src, __constrange(0,1) int lane);// VLD2.32 {d0[0], d1[0]}, [r0] +int32x2x2_t vld2_lane_s32_ptr(__transfersize(2) int32_t const * ptr, int32x2x2_t * src, __constrange(0,1) int lane); // VLD2.32 {d0[0], d1[0]}, [r0] +#define vld2_lane_s32(ptr, src, lane) vld2_lane_u32(( uint32_t*) ptr, src, lane) + +//float16x4x2_t vld2_lane_f16(__transfersize(2) __fp16 const * ptr, float16x4x2_t src, __constrange(0,3) int lane); // VLD2.16 {d0[0], d1[0]}, [r0] +//current IA SIMD doesn't support float16 + +float32x2x2_t vld2_lane_f32_ptr(__transfersize(2) float32_t const * ptr, float32x2x2_t * src,__constrange(0,1) int lane); // VLD2.32 {d0[0], d1[0]}, [r0] +_NEON2SSE_INLINE float32x2x2_t vld2_lane_f32_ptr(__transfersize(2) float32_t const * ptr, float32x2x2_t * src,__constrange(0,1) int lane) +{ + float32x2x2_t v; + v.val[0] = vld1_lane_f32(ptr, src->val[0], lane); + v.val[1] = vld1_lane_f32((ptr + 1), src->val[1], lane); + return v; +} +#define vld2_lane_f32(ptr, src, lane) vld2_lane_f32_ptr(ptr, &src, lane) + +//poly8x8x2_t vld2_lane_p8(__transfersize(2) poly8_t const * ptr, poly8x8x2_t src, __constrange(0,7) int lane);// VLD2.8 {d0[0], d1[0]}, [r0] +poly8x8x2_t vld2_lane_p8_ptr(__transfersize(2) poly8_t const * ptr, poly8x8x2_t * src, __constrange(0,7) int lane); // VLD2.8 {d0[0], d1[0]}, [r0] +#define vld2_lane_p8 vld2_lane_u8 + +//poly16x4x2_t vld2_lane_p16(__transfersize(2) poly16_t const * ptr, poly16x4x2_t src, __constrange(0,3)int lane);// VLD2.16 {d0[0], d1[0]}, [r0] +poly16x4x2_t vld2_lane_p16_ptr(__transfersize(2) poly16_t const * ptr, poly16x4x2_t * src, __constrange(0,3) int lane); // VLD2.16 {d0[0], d1[0]}, [r0] +#define vld2_lane_p16 vld2_lane_u16 + +//*********** Lane triplets ********************** +//************************************************* +//does vld1_lane_xx ptr[0] to src->val[0], ptr[1] to src->val[1] and ptr[2] to src->val[2] at lane positon +//we assume src is 16 bit aligned + +//uint16x8x3_t vld3q_lane_u16(__transfersize(3) uint16_t const * ptr, uint16x8x3_t src,__constrange(0,7) int lane);// VLD3.16 {d0[0], d2[0], d4[0]}, [r0] +_NEON2SSE_INLINE uint16x8x3_t vld3q_lane_u16_ptr(__transfersize(3) uint16_t const * ptr, uint16x8x3_t* src,__constrange(0,7) int lane) // VLD3.16 {d0[0], d2[0], d4[0]}, [r0] +{ + uint16x8x3_t v; + v.val[0] = _MM_INSERT_EPI16 ( src->val[0], ptr[0], lane); + v.val[1] = _MM_INSERT_EPI16 ( src->val[1], ptr[1], lane); + v.val[2] = _MM_INSERT_EPI16 ( src->val[2], ptr[2], lane); + return v; +} +#define vld3q_lane_u16(ptr, src, lane) vld3q_lane_u16_ptr(ptr, &src, lane) + +//uint32x4x3_t vld3q_lane_u32(__transfersize(3) uint32_t const * ptr, uint32x4x3_t src,__constrange(0,3) int lane);// VLD3.32 {d0[0], d2[0], d4[0]}, [r0] +_NEON2SSE_INLINE uint32x4x3_t vld3q_lane_u32_ptr(__transfersize(3) uint32_t const * ptr, uint32x4x3_t* src,__constrange(0,3) int lane) // VLD3.32 {d0[0], d2[0], d4[0]}, [r0] +{ + uint32x4x3_t v; + v.val[0] = _MM_INSERT_EPI32 ( src->val[0], ptr[0], lane); + v.val[1] = _MM_INSERT_EPI32 ( src->val[1], ptr[1], lane); + v.val[2] = _MM_INSERT_EPI32 ( src->val[2], ptr[2], lane); + return v; +} +#define vld3q_lane_u32(ptr, src, lane) vld3q_lane_u32_ptr(ptr, &src, lane) + +//int16x8x3_t vld3q_lane_s16(__transfersize(3) int16_t const * ptr, int16x8x3_t src, __constrange(0,7)int lane);// VLD3.16 {d0[0], d2[0], d4[0]}, [r0] +_NEON2SSE_INLINE int16x8x3_t vld3q_lane_s16_ptr(__transfersize(3) int16_t const * ptr, int16x8x3_t* src, __constrange(0,7) int lane) // VLD3.16 {d0[0], d2[0], d4[0]}, [r0] +{ + int16x8x3_t v; + v.val[0] = _MM_INSERT_EPI16 ( src->val[0], ptr[0], lane); + v.val[1] = _MM_INSERT_EPI16 ( src->val[1], ptr[1], lane); + v.val[2] = _MM_INSERT_EPI16 ( src->val[2], ptr[2], lane); + return v; +} +#define vld3q_lane_s16(ptr, src, lane) vld3q_lane_s16_ptr(ptr, &src, lane) + +//int32x4x3_t vld3q_lane_s32(__transfersize(3) int32_t const * ptr, int32x4x3_t src, __constrange(0,3)int lane);// VLD3.32 {d0[0], d2[0], d4[0]}, [r0] +_NEON2SSE_INLINE int32x4x3_t vld3q_lane_s32_ptr(__transfersize(3) int32_t const * ptr, int32x4x3_t* src, __constrange(0,3) int lane) // VLD3.32 {d0[0], d2[0], d4[0]}, [r0] +{ + int32x4x3_t v; + v.val[0] = _MM_INSERT_EPI32 ( src->val[0], ptr[0], lane); + v.val[1] = _MM_INSERT_EPI32 ( src->val[1], ptr[1], lane); + v.val[2] = _MM_INSERT_EPI32 ( src->val[2], ptr[2], lane); + return v; +} +#define vld3q_lane_s32(ptr, src, lane) vld3q_lane_s32_ptr(ptr, &src, lane) + +float16x8x3_t vld3q_lane_f16_ptr(__transfersize(3) __fp16 const * ptr, float16x8x3_t * src, __constrange(0,7) int lane); // VLD3.16 {d0[0], d2[0], d4[0]}, [r0] +//current IA SIMD doesn't support float16 +#define vld3q_lane_f16(ptr, src, lane) vld3q_lane_f16_ptr(ptr, &src, lane) + + +//float32x4x3_t vld3q_lane_f32(__transfersize(3) float32_t const * ptr, float32x4x3_t src,__constrange(0,3) int lane);// VLD3.32 {d0[0], d2[0], d4[0]}, [r0] +_NEON2SSE_INLINE float32x4x3_t vld3q_lane_f32_ptr(__transfersize(3) float32_t const * ptr, float32x4x3_t* src,__constrange(0,3) int lane) // VLD3.32 {d0[0], d2[0], d4[0]}, [r0] +{ + float32x4x3_t v; + v.val[0] = vld1q_lane_f32(&ptr[0], src->val[0], lane); + v.val[1] = vld1q_lane_f32(&ptr[1], src->val[1], lane); + v.val[2] = vld1q_lane_f32(&ptr[2], src->val[2], lane); + return v; +} +#define vld3q_lane_f32(ptr,src,lane) vld3q_lane_f32_ptr(ptr,&src,lane) + +poly16x8x3_t vld3q_lane_p16_ptr(__transfersize(3) poly16_t const * ptr, poly16x8x3_t * src,__constrange(0,7) int lane); // VLD3.16 {d0[0], d2[0], d4[0]}, [r0] +#define vld3q_lane_p16 vld3q_lane_u16 + +//uint8x8x3_t vld3_lane_u8(__transfersize(3) uint8_t const * ptr, uint8x8x3_t src, __constrange(0,7) int lane);// VLD3.8 {d0[0], d1[0], d2[0]}, [r0] +_NEON2SSE_INLINE uint8x8x3_t vld3_lane_u8_ptr(__transfersize(3) uint8_t const * ptr, uint8x8x3_t* src, __constrange(0,7) int lane) // VLD3.8 {d0[0], d1[0], d2[0]}, [r0] +{ + uint8x8x3_t v; + v.val[0] = vld1_lane_u8(ptr, src->val[0], lane); + v.val[1] = vld1_lane_u8((ptr + 1), src->val[1], lane); + v.val[2] = vld1_lane_u8((ptr + 2), src->val[2], lane); + return v; +} +#define vld3_lane_u8(ptr, src, lane) vld3_lane_u8_ptr(ptr, &src, lane) + +//uint16x4x3_t vld3_lane_u16(__transfersize(3) uint16_t const * ptr, uint16x4x3_t src, __constrange(0,3)int lane);// VLD3.16 {d0[0], d1[0], d2[0]}, [r0] +_NEON2SSE_INLINE uint16x4x3_t vld3_lane_u16_ptr(__transfersize(3) uint16_t const * ptr, uint16x4x3_t* src, __constrange(0,3) int lane) // VLD3.16 {d0[0], d1[0], d2[0]}, [r0] +{ + uint16x4x3_t v; + v.val[0] = vld1_lane_u16(ptr, src->val[0], lane); + v.val[1] = vld1_lane_u16((ptr + 1), src->val[1], lane); + v.val[2] = vld1_lane_u16((ptr + 2), src->val[2], lane); + return v; +} +#define vld3_lane_u16(ptr, src, lane) vld3_lane_u16_ptr(ptr, &src, lane) + +//uint32x2x3_t vld3_lane_u32(__transfersize(3) uint32_t const * ptr, uint32x2x3_t src, __constrange(0,1)int lane);// VLD3.32 {d0[0], d1[0], d2[0]}, [r0] +_NEON2SSE_INLINE uint32x2x3_t vld3_lane_u32_ptr(__transfersize(3) uint32_t const * ptr, uint32x2x3_t* src, __constrange(0,1) int lane) // VLD3.32 {d0[0], d1[0], d2[0]}, [r0] +{ + //need to merge into 128 bit anyway + uint32x2x3_t v; + v.val[0] = vld1_lane_u32(ptr, src->val[0], lane);; + v.val[1] = vld1_lane_u32((ptr + 1), src->val[1], lane);; + v.val[2] = vld1_lane_u32((ptr + 2), src->val[2], lane);; + return v; +} +#define vld3_lane_u32(ptr, src, lane) vld3_lane_u32_ptr(ptr, &src, lane) + +int8x8x3_t vld3_lane_s8_ptr(__transfersize(3) int8_t const * ptr, int8x8x3_t * src, __constrange(0,7) int lane); // VLD3.8 {d0[0], d1[0], d2[0]}, [r0] +#define vld3_lane_s8(ptr, src, lane) vld3_lane_u8_ptr(( uint8_t*) ptr, &src, lane) + +int16x4x3_t vld3_lane_s16_ptr(__transfersize(3) int16_t const * ptr, int16x4x3_t * src, __constrange(0,3) int lane); // VLD3.16 {d0[0], d1[0], d2[0]}, [r0] +#define vld3_lane_s16(ptr, src, lane) vld3_lane_u16_ptr(( uint16_t*) ptr, &src, lane) + +int32x2x3_t vld3_lane_s32_ptr(__transfersize(3) int32_t const * ptr, int32x2x3_t * src, __constrange(0,1) int lane); // VLD3.32 {d0[0], d1[0], d2[0]}, [r0] +#define vld3_lane_s32(ptr, src, lane) vld3_lane_u32_ptr(( uint32_t*) ptr, &src, lane) + +float16x4x3_t vld3_lane_f16_ptr(__transfersize(3) __fp16 const * ptr, float16x4x3_t * src, __constrange(0,3) int lane); // VLD3.16 {d0[0], d1[0], d2[0]}, [r0] +//current IA SIMD doesn't support float16 + +//float32x2x3_t vld3_lane_f32(__transfersize(3) float32_t const * ptr, float32x2x3_t src,__constrange(0,1) int lane);// VLD3.32 {d0[0], d1[0], d2[0]}, [r0] +_NEON2SSE_INLINE float32x2x3_t vld3_lane_f32_ptr(__transfersize(3) float32_t const * ptr, float32x2x3_t* src,__constrange(0,1) int lane) // VLD3.32 {d0[0], d1[0], d2[0]}, [r0] +{ + float32x2x3_t v; + v.val[0] = vld1_lane_f32(ptr, src->val[0], lane); + v.val[1] = vld1_lane_f32((ptr + 1), src->val[1], lane); + v.val[2] = vld1_lane_f32((ptr + 2), src->val[2], lane); + return v; +} +#define vld3_lane_f32(ptr,src,lane) vld3_lane_f32_ptr(ptr,&src,lane) + +//poly8x8x3_t vld3_lane_p8_ptr(__transfersize(3) poly8_t const * ptr, poly8x8x3_t * src, __constrange(0,7) int lane); // VLD3.8 {d0[0], d1[0], d2[0]}, [r0] +#define vld3_lane_p8 vld3_lane_u8 + +//poly16x4x3_t vld3_lane_p16(__transfersize(3) poly16_t const * ptr, poly16x4x3_t * src, __constrange(0,3) int lane); // VLD3.16 {d0[0], d1[0], d2[0]}, [r0] +#define vld3_lane_p16 vld3_lane_u16 + +//******************* Lane Quadruples load *************************** +//********************************************************************* +//does vld1_lane_xx ptr[0] to src->val[0], ptr[1] to src->val[1], ptr[2] to src->val[2] and ptr[3] to src->val[3] at lane positon +//we assume src is 16 bit aligned + +//uint16x8x4_t vld4q_lane_u16(__transfersize(4) uint16_t const * ptr, uint16x8x4_t src,__constrange(0,7) int lane)// VLD4.16 {d0[0], d2[0], d4[0], d6[0]}, [r0] +_NEON2SSE_INLINE uint16x8x4_t vld4q_lane_u16_ptr(__transfersize(4) uint16_t const * ptr, uint16x8x4_t* src,__constrange(0,7) int lane) +{ + uint16x8x4_t v; + v.val[0] = _MM_INSERT_EPI16 ( src->val[0], ptr[0], lane); + v.val[1] = _MM_INSERT_EPI16 ( src->val[1], ptr[1], lane); + v.val[2] = _MM_INSERT_EPI16 ( src->val[2], ptr[2], lane); + v.val[3] = _MM_INSERT_EPI16 ( src->val[3], ptr[3], lane); + return v; +} +#define vld4q_lane_u16(ptr, src, lane) vld4q_lane_u16_ptr(ptr, &src, lane) + +//uint32x4x4_t vld4q_lane_u32(__transfersize(4) uint32_t const * ptr, uint32x4x4_t src,__constrange(0,3) int lane)// VLD4.32 {d0[0], d2[0], d4[0], d6[0]}, [r0] +_NEON2SSE_INLINE uint32x4x4_t vld4q_lane_u32_ptr(__transfersize(4) uint32_t const * ptr, uint32x4x4_t* src,__constrange(0,3) int lane) +{ + uint32x4x4_t v; + v.val[0] = _MM_INSERT_EPI32 ( src->val[0], ptr[0], lane); + v.val[1] = _MM_INSERT_EPI32 ( src->val[1], ptr[1], lane); + v.val[2] = _MM_INSERT_EPI32 ( src->val[2], ptr[2], lane); + v.val[3] = _MM_INSERT_EPI32 ( src->val[3], ptr[3], lane); + return v; +} +#define vld4q_lane_u32(ptr, src, lane) vld4q_lane_u32_ptr(ptr, &src, lane) + +//int16x8x4_t vld4q_lane_s16(__transfersize(4) int16_t const * ptr, int16x8x4_t src, __constrange(0,7)int lane);// VLD4.16 {d0[0], d2[0], d4[0], d6[0]}, [r0] +int16x8x4_t vld4q_lane_s16_ptr(__transfersize(4) int16_t const * ptr, int16x8x4_t * src, __constrange(0,7) int lane); // VLD4.16 {d0[0], d2[0], d4[0], d6[0]}, [r0] +#define vld4q_lane_s16(ptr, src, lane) vld4q_lane_u16(( uint16_t*) ptr, src, lane) + +//int32x4x4_t vld4q_lane_s32(__transfersize(4) int32_t const * ptr, int32x4x4_t src, __constrange(0,3)int lane);// VLD4.32 {d0[0], d2[0], d4[0], d6[0]}, [r0] +int32x4x4_t vld4q_lane_s32_ptr(__transfersize(4) int32_t const * ptr, int32x4x4_t * src, __constrange(0,3) int lane); // VLD4.32 {d0[0], d2[0], d4[0], d6[0]}, [r0] +#define vld4q_lane_s32(ptr, src, lane) vld4q_lane_u32(( uint32_t*) ptr, src, lane) + +//float16x8x4_t vld4q_lane_f16(__transfersize(4) __fp16 const * ptr, float16x8x4_t src, __constrange(0,7)int lane);// VLD4.16 {d0[0], d2[0], d4[0], d6[0]}, [r0] +float16x8x4_t vld4q_lane_f16_ptr(__transfersize(4) __fp16 const * ptr, float16x8x4_t * src, __constrange(0,7) int lane); // VLD4.16 {d0[0], d2[0], d4[0], d6[0]}, [r0] +//current IA SIMD doesn't support float16 + +//float32x4x4_t vld4q_lane_f32(__transfersize(4) float32_t const * ptr, float32x4x4_t src,__constrange(0,3) int lane)// VLD4.32 {d0[0], d2[0], d4[0], d6[0]}, [r0] +_NEON2SSE_INLINE float32x4x4_t vld4q_lane_f32_ptr(__transfersize(4) float32_t const * ptr, float32x4x4_t* src,__constrange(0,3) int lane) +{ + float32x4x4_t v; + v.val[0] = vld1q_lane_f32(&ptr[0], src->val[0], lane); + v.val[1] = vld1q_lane_f32(&ptr[1], src->val[1], lane); + v.val[2] = vld1q_lane_f32(&ptr[2], src->val[2], lane); + v.val[3] = vld1q_lane_f32(&ptr[3], src->val[3], lane); + return v; +} +#define vld4q_lane_f32(ptr,val,lane) vld4q_lane_f32_ptr(ptr,&val,lane) + +//poly16x8x4_t vld4q_lane_p16(__transfersize(4) poly16_t const * ptr, poly16x8x4_t src,__constrange(0,7) int lane);// VLD4.16 {d0[0], d2[0], d4[0], d6[0]}, [r0] +poly16x8x4_t vld4q_lane_p16_ptr(__transfersize(4) poly16_t const * ptr, poly16x8x4_t * src,__constrange(0,7) int lane); // VLD4.16 {d0[0], d2[0], d4[0], d6[0]}, [r0] +#define vld4q_lane_p16 vld4q_lane_u16 + +//uint8x8x4_t vld4_lane_u8(__transfersize(4) uint8_t const * ptr, uint8x8x4_t src, __constrange(0,7) int lane)// VLD4.8 {d0[0], d1[0], d2[0], d3[0]}, [r0] +_NEON2SSE_INLINE uint8x8x4_t vld4_lane_u8_ptr(__transfersize(4) uint8_t const * ptr, uint8x8x4_t* src, __constrange(0,7) int lane) +{ + uint8x8x4_t v; + v.val[0] = vld1_lane_u8(ptr, src->val[0], lane); + v.val[1] = vld1_lane_u8((ptr + 1), src->val[1], lane); + v.val[2] = vld1_lane_u8((ptr + 2), src->val[2], lane); + v.val[3] = vld1_lane_u8((ptr + 3), src->val[3], lane); + return v; +} +#define vld4_lane_u8(ptr, src, lane) vld4_lane_u8_ptr(ptr, &src, lane) + +//uint16x4x4_t vld4_lane_u16(__transfersize(4) uint16_t const * ptr, uint16x4x4_t src, __constrange(0,3)int lane)// VLD4.16 {d0[0], d1[0], d2[0], d3[0]}, [r0] +_NEON2SSE_INLINE uint16x4x4_t vld4_lane_u16_ptr(__transfersize(4) uint16_t const * ptr, uint16x4x4_t* src, __constrange(0,3) int lane) +{ + uint16x4x4_t v; + v.val[0] = vld1_lane_u16(ptr, src->val[0], lane); + v.val[1] = vld1_lane_u16((ptr + 1), src->val[1], lane); + v.val[2] = vld1_lane_u16((ptr + 2), src->val[2], lane); + v.val[3] = vld1_lane_u16((ptr + 3), src->val[3], lane); + return v; +} +#define vld4_lane_u16(ptr, src, lane) vld4_lane_u16_ptr(ptr, &src, lane) + +//uint32x2x4_t vld4_lane_u32(__transfersize(4) uint32_t const * ptr, uint32x2x4_t src, __constrange(0,1)int lane)// VLD4.32 {d0[0], d1[0], d2[0], d3[0]}, [r0] +_NEON2SSE_INLINE uint32x2x4_t vld4_lane_u32_ptr(__transfersize(4) uint32_t const * ptr, uint32x2x4_t* src, __constrange(0,1) int lane) +{ + uint32x2x4_t v; + v.val[0] = vld1_lane_u32(ptr, src->val[0], lane); + v.val[1] = vld1_lane_u32((ptr + 1), src->val[1], lane); + v.val[2] = vld1_lane_u32((ptr + 2), src->val[2], lane); + v.val[3] = vld1_lane_u32((ptr + 3), src->val[3], lane); + return v; +} +#define vld4_lane_u32(ptr, src, lane) vld4_lane_u32_ptr(ptr, &src, lane) + +//int8x8x4_t vld4_lane_s8(__transfersize(4) int8_t const * ptr, int8x8x4_t src, __constrange(0,7) int lane);// VLD4.8 {d0[0], d1[0], d2[0], d3[0]}, [r0] +int8x8x4_t vld4_lane_s8_ptr(__transfersize(4) int8_t const * ptr, int8x8x4_t * src, __constrange(0,7) int lane); +#define vld4_lane_s8(ptr,src,lane) vld4_lane_u8((uint8_t*)ptr,src,lane) + +//int16x4x4_t vld4_lane_s16(__transfersize(4) int16_t const * ptr, int16x4x4_t src, __constrange(0,3) int lane);// VLD4.16 {d0[0], d1[0], d2[0], d3[0]}, [r0] +int16x4x4_t vld4_lane_s16_ptr(__transfersize(4) int16_t const * ptr, int16x4x4_t * src, __constrange(0,3) int lane); +#define vld4_lane_s16(ptr,src,lane) vld4_lane_u16((uint16_t*)ptr,src,lane) + +//int32x2x4_t vld4_lane_s32(__transfersize(4) int32_t const * ptr, int32x2x4_t src, __constrange(0,1) int lane);// VLD4.32 {d0[0], d1[0], d2[0], d3[0]}, [r0] +int32x2x4_t vld4_lane_s32_ptr(__transfersize(4) int32_t const * ptr, int32x2x4_t * src, __constrange(0,1) int lane); +#define vld4_lane_s32(ptr,src,lane) vld4_lane_u32((uint32_t*)ptr,src,lane) + +//float16x4x4_t vld4_lane_f16(__transfersize(4) __fp16 const * ptr, float16x4x4_t src, __constrange(0,3)int lane);// VLD4.16 {d0[0], d1[0], d2[0], d3[0]}, [r0] +float16x4x4_t vld4_lane_f16_ptr(__transfersize(4) __fp16 const * ptr, float16x4x4_t * src, __constrange(0,3) int lane); +//current IA SIMD doesn't support float16 + +//float32x2x4_t vld4_lane_f32(__transfersize(4) float32_t const * ptr, float32x2x4_t src,__constrange(0,1) int lane)// VLD4.32 {d0[0], d1[0], d2[0], d3[0]}, [r0] +_NEON2SSE_INLINE float32x2x4_t vld4_lane_f32_ptr(__transfersize(4) float32_t const * ptr, float32x2x4_t* src,__constrange(0,1) int lane) +{ + //serial solution may be faster + float32x2x4_t v; + v.val[0] = vld1_lane_f32(ptr, src->val[0], lane); + v.val[1] = vld1_lane_f32((ptr + 1), src->val[1], lane); + v.val[2] = vld1_lane_f32((ptr + 2), src->val[2], lane); + v.val[3] = vld1_lane_f32((ptr + 3), src->val[3], lane); + return v; +} +#define vld4_lane_f32(ptr,src,lane) vld4_lane_f32_ptr(ptr,&src,lane) + +//poly8x8x4_t vld4_lane_p8(__transfersize(4) poly8_t const * ptr, poly8x8x4_t src, __constrange(0,7) int lane);// VLD4.8 {d0[0], d1[0], d2[0], d3[0]}, [r0] +poly8x8x4_t vld4_lane_p8_ptr(__transfersize(4) poly8_t const * ptr, poly8x8x4_t * src, __constrange(0,7) int lane); +#define vld4_lane_p8 vld4_lane_u8 + +//poly16x4x4_t vld4_lane_p16(__transfersize(4) poly16_t const * ptr, poly16x4x4_t src, __constrange(0,3)int lane);// VLD4.16 {d0[0], d1[0], d2[0], d3[0]}, [r0] +poly16x4x4_t vld4_lane_p16_ptr(__transfersize(4) poly16_t const * ptr, poly16x4x4_t * src, __constrange(0,3) int lane); +#define vld4_lane_p16 vld4_lane_u16 + +//******************* Store duplets ********************************************* +//******************************************************************************** +//here we assume the ptr is 16bit aligned. If not we need to use _mm_storeu_si128 like shown in vst1q_u8 function +//If necessary you need to modify all store functions accordingly. See more comments to "Store single" functions +//void vst2q_u8(__transfersize(32) uint8_t * ptr, uint8x16x2_t val)// VST2.8 {d0, d2}, [r0] +_NEON2SSE_INLINE void vst2q_u8_ptr(__transfersize(32) uint8_t * ptr, uint8x16x2_t* val) +{ + uint8x16x2_t v; + v.val[0] = _mm_unpacklo_epi8(val->val[0], val->val[1]); + v.val[1] = _mm_unpackhi_epi8(val->val[0], val->val[1]); + vst1q_u8 (ptr, v.val[0]); + vst1q_u8 ((ptr + 16), v.val[1]); +} +#define vst2q_u8(ptr, val) vst2q_u8_ptr(ptr, &val) + +//void vst2q_u16(__transfersize(16) uint16_t * ptr, uint16x8x2_t val)// VST2.16 {d0, d2}, [r0] +_NEON2SSE_INLINE void vst2q_u16_ptr(__transfersize(16) uint16_t * ptr, uint16x8x2_t* val) +{ + uint16x8x2_t v; + v.val[0] = _mm_unpacklo_epi16(val->val[0], val->val[1]); + v.val[1] = _mm_unpackhi_epi16(val->val[0], val->val[1]); + vst1q_u16 (ptr, v.val[0]); + vst1q_u16 ((ptr + 8), v.val[1]); +} +#define vst2q_u16(ptr, val) vst2q_u16_ptr(ptr, &val) + +//void vst2q_u32(__transfersize(8) uint32_t * ptr, uint32x4x2_t val)// VST2.32 {d0, d2}, [r0] +_NEON2SSE_INLINE void vst2q_u32_ptr(__transfersize(8) uint32_t* ptr, uint32x4x2_t* val) +{ + uint32x4x2_t v; + v.val[0] = _mm_unpacklo_epi32(val->val[0], val->val[1]); + v.val[1] = _mm_unpackhi_epi32(val->val[0], val->val[1]); + vst1q_u32 (ptr, v.val[0]); + vst1q_u32 ((ptr + 4), v.val[1]); +} +#define vst2q_u32(ptr, val) vst2q_u32_ptr(ptr, &val) + +//void vst2q_s8(__transfersize(32) int8_t * ptr, int8x16x2_t val); // VST2.8 {d0, d2}, [r0] +void vst2q_s8_ptr(__transfersize(32) int8_t * ptr, int8x16x2_t * val); +#define vst2q_s8(ptr, val) vst2q_u8((uint8_t*)(ptr), val) + +//void vst2q_s16(__transfersize(16) int16_t * ptr, int16x8x2_t val);// VST2.16 {d0, d2}, [r0] +void vst2q_s16_ptr(__transfersize(16) int16_t * ptr, int16x8x2_t * val); +#define vst2q_s16(ptr, val) vst2q_u16((uint16_t*)(ptr), val) + +//void vst2q_s32(__transfersize(8) int32_t * ptr, int32x4x2_t val);// VST2.32 {d0, d2}, [r0] +void vst2q_s32_ptr(__transfersize(8) int32_t * ptr, int32x4x2_t * val); +#define vst2q_s32(ptr, val) vst2q_u32((uint32_t*)(ptr), val) + +//void vst2q_f16(__transfersize(16) __fp16 * ptr, float16x8x2_t val);// VST2.16 {d0, d2}, [r0] +void vst2q_f16_ptr(__transfersize(16) __fp16 * ptr, float16x8x2_t * val); +// IA32 SIMD doesn't work with 16bit floats currently + +//void vst2q_f32(__transfersize(8) float32_t * ptr, float32x4x2_t val)// VST2.32 {d0, d2}, [r0] +_NEON2SSE_INLINE void vst2q_f32_ptr(__transfersize(8) float32_t* ptr, float32x4x2_t* val) +{ + float32x4x2_t v; + v.val[0] = _mm_unpacklo_ps(val->val[0], val->val[1]); + v.val[1] = _mm_unpackhi_ps(val->val[0], val->val[1]); + vst1q_f32 (ptr, v.val[0]); + vst1q_f32 ((ptr + 4), v.val[1]); +} +#define vst2q_f32(ptr, val) vst2q_f32_ptr(ptr, &val) + +//void vst2q_p8(__transfersize(32) poly8_t * ptr, poly8x16x2_t val);// VST2.8 {d0, d2}, [r0] +void vst2q_p8_ptr(__transfersize(32) poly8_t * ptr, poly8x16x2_t * val); +#define vst2q_p8 vst2q_u8 + +//void vst2q_p16(__transfersize(16) poly16_t * ptr, poly16x8x2_t val);// VST2.16 {d0, d2}, [r0] +void vst2q_p16_ptr(__transfersize(16) poly16_t * ptr, poly16x8x2_t * val); +#define vst2q_p16 vst2q_u16 + +//void vst2_u8(__transfersize(16) uint8_t * ptr, uint8x8x2_t val);// VST2.8 {d0, d1}, [r0] +_NEON2SSE_INLINE void vst2_u8_ptr(__transfersize(16) uint8_t * ptr, uint8x8x2_t* val) +{ + __m128i v0; + v0 = _mm_unpacklo_epi8(_pM128i(val->val[0]), _pM128i(val->val[1])); + vst1q_u8 (ptr, v0); +} +#define vst2_u8(ptr, val) vst2_u8_ptr(ptr, &val) + +//void vst2_u16(__transfersize(8) uint16_t * ptr, uint16x4x2_t val);// VST2.16 {d0, d1}, [r0] +_NEON2SSE_INLINE void vst2_u16_ptr(__transfersize(8) uint16_t * ptr, uint16x4x2_t* val) +{ + __m128i v0; + v0 = _mm_unpacklo_epi16(_pM128i(val->val[0]), _pM128i(val->val[1])); + vst1q_u16 (ptr, v0); +} +#define vst2_u16(ptr, val) vst2_u16_ptr(ptr, &val) + +//void vst2_u32(__transfersize(4) uint32_t * ptr, uint32x2x2_t val);// VST2.32 {d0, d1}, [r0] +_NEON2SSE_INLINE void vst2_u32_ptr(__transfersize(4) uint32_t * ptr, uint32x2x2_t* val) +{ + __m128i v0; + v0 = _mm_unpacklo_epi32(_pM128i(val->val[0]), _pM128i(val->val[1])); + vst1q_u32 (ptr, v0); +} +#define vst2_u32(ptr, val) vst2_u32_ptr(ptr, &val) + + +//void vst2_u64(__transfersize(2) uint64_t * ptr, uint64x1x2_t val);// VST1.64 {d0, d1}, [r0] +void vst2_u64_ptr(__transfersize(2) uint64_t * ptr, uint64x1x2_t * val); +_NEON2SSE_INLINE void vst2_u64_ptr(__transfersize(2) uint64_t * ptr, uint64x1x2_t* val) +{ + *(ptr) = val->val[0].m64_u64[0]; + *(ptr + 1) = val->val[1].m64_u64[0]; +} +#define vst2_u64(ptr, val) vst2_u64_ptr(ptr, &val) + +//void vst2_s8(__transfersize(16) int8_t * ptr, int8x8x2_t val);// VST2.8 {d0, d1}, [r0] +#define vst2_s8(ptr, val) vst2_u8((uint8_t*) ptr, val) + +//void vst2_s16(__transfersize(8) int16_t * ptr, int16x4x2_t val); // VST2.16 {d0, d1}, [r0] +#define vst2_s16(ptr,val) vst2_u16((uint16_t*) ptr, val) + +//void vst2_s32(__transfersize(4) int32_t * ptr, int32x2x2_t val); // VST2.32 {d0, d1}, [r0] +#define vst2_s32(ptr,val) vst2_u32((uint32_t*) ptr, val) + +//void vst2_s64(__transfersize(2) int64_t * ptr, int64x1x2_t val); +#define vst2_s64(ptr,val) vst2_u64((uint64_t*) ptr,val) + +//void vst2_f16(__transfersize(8) __fp16 * ptr, float16x4x2_t val); // VST2.16 {d0, d1}, [r0] +//current IA SIMD doesn't support float16 + +//void vst2_f32(__transfersize(4) float32_t * ptr, float32x2x2_t val); // VST2.32 {d0, d1}, [r0] +_NEON2SSE_INLINE void vst2_f32_ptr(__transfersize(4) float32_t* ptr, float32x2x2_t* val) +{ + *(ptr) = val->val[0].m64_f32[0]; + *(ptr + 1) = val->val[1].m64_f32[0]; + *(ptr + 2) = val->val[0].m64_f32[1]; + *(ptr + 3) = val->val[1].m64_f32[1]; +} +#define vst2_f32(ptr, val) vst2_f32_ptr(ptr, &val) + +//void vst2_p8_ptr(__transfersize(16) poly8_t * ptr, poly8x8x2_t * val); // VST2.8 {d0, d1}, [r0] +#define vst2_p8 vst2_u8 + +//void vst2_p16_ptr(__transfersize(8) poly16_t * ptr, poly16x4x2_t * val); // VST2.16 {d0, d1}, [r0] +#define vst2_p16 vst2_u16 + +//******************** Triplets store ***************************************** +//****************************************************************************** +//void vst3q_u8(__transfersize(48) uint8_t * ptr, uint8x16x3_t val)// VST3.8 {d0, d2, d4}, [r0] +_NEON2SSE_INLINE void vst3q_u8_ptr(__transfersize(48) uint8_t * ptr, uint8x16x3_t* val) +{ + uint8x16x3_t v; + __m128i v0,v1,v2, cff, bldmask; + _NEON2SSE_ALIGN_16 uint8_t mask0[16] = {0, 1, 0xff, 2, 3,0xff, 4, 5,0xff, 6,7,0xff, 8,9,0xff, 10}; + _NEON2SSE_ALIGN_16 uint8_t mask1[16] = {0, 0xff, 1, 2, 0xff, 3, 4, 0xff, 5, 6, 0xff, 7,8,0xff, 9,10}; + _NEON2SSE_ALIGN_16 uint8_t mask2[16] = {0xff, 6, 7, 0xff, 8, 9,0xff, 10, 11,0xff, 12,13,0xff, 14,15,0xff}; + _NEON2SSE_ALIGN_16 uint8_t mask2lo[16] = {0xff,0xff, 0, 0xff,0xff, 1, 0xff,0xff, 2, 0xff,0xff, 3, 0xff,0xff, 4, 0xff}; + _NEON2SSE_ALIGN_16 uint8_t mask2med[16] = {0xff, 5, 0xff, 0xff, 6, 0xff,0xff, 7, 0xff,0xff, 8, 0xff,0xff, 9, 0xff, 0xff}; + _NEON2SSE_ALIGN_16 uint8_t mask2hi[16] = {10, 0xff,0xff, 11, 0xff,0xff, 12, 0xff,0xff, 13, 0xff,0xff, 14, 0xff, 0xff, 15}; + + v0 = _mm_unpacklo_epi8(val->val[0], val->val[1]); //0,1, 3,4, 6,7, 9,10, 12,13, 15,16, 18,19, 21,22 + v2 = _mm_unpackhi_epi8(val->val[0], val->val[1]); //24,25, 27,28, 30,31, 33,34, 36,37, 39,40, 42,43, 45,46 + v1 = _mm_alignr_epi8(v2, v0, 11); //12,13, 15,16, 18,19, 21,22, 24,25, 27,28, 30,31, 33,34 + v.val[0] = _mm_shuffle_epi8(v0, *(__m128i*)mask0); //make holes for the v.val[2] data embedding + v.val[2] = _mm_shuffle_epi8(val->val[2], *(__m128i*)mask2lo); //make plugs for the v.val[2] data embedding + cff = _mm_cmpeq_epi8(v0, v0); //all ff + bldmask = _mm_cmpeq_epi8(*(__m128i*)mask0, cff); + v.val[0] = _MM_BLENDV_EPI8(v.val[0], v.val[2], bldmask); + vst1q_u8(ptr, v.val[0]); + v.val[0] = _mm_shuffle_epi8(v1, *(__m128i*)mask1); //make holes for the v.val[2] data embedding + v.val[2] = _mm_shuffle_epi8(val->val[2], *(__m128i*)mask2med); //make plugs for the v.val[2] data embedding + bldmask = _mm_cmpeq_epi8(*(__m128i*)mask1, cff); + v.val[1] = _MM_BLENDV_EPI8(v.val[0],v.val[2], bldmask); + vst1q_u8((ptr + 16), v.val[1]); + v.val[0] = _mm_shuffle_epi8(v2, *(__m128i*)mask2); //make holes for the v.val[2] data embedding + v.val[2] = _mm_shuffle_epi8(val->val[2], *(__m128i*)mask2hi); //make plugs for the v.val[2] data embedding + bldmask = _mm_cmpeq_epi8(*(__m128i*)mask2, cff); + v.val[2] = _MM_BLENDV_EPI8(v.val[0],v.val[2], bldmask ); + vst1q_u8((ptr + 32), v.val[2]); +} +#define vst3q_u8(ptr, val) vst3q_u8_ptr(ptr, &val) + +//void vst3q_u16(__transfersize(24) uint16_t * ptr, uint16x8x3_t val)// VST3.16 {d0, d2, d4}, [r0] +_NEON2SSE_INLINE void vst3q_u16_ptr(__transfersize(24) uint16_t * ptr, uint16x8x3_t* val) +{ + uint16x8x3_t v; + __m128i v0,v1,v2, cff, bldmask; + _NEON2SSE_ALIGN_16 uint8_t mask0[16] = {0,1, 2,3, 0xff,0xff, 4,5, 6,7,0xff,0xff, 8,9,10,11}; + _NEON2SSE_ALIGN_16 uint8_t mask1[16] = {0xff, 0xff, 0,1, 2,3, 0xff,0xff, 4,5, 6,7, 0xff,0xff, 8,9}; + _NEON2SSE_ALIGN_16 uint8_t mask2[16] = {6,7,0xff,0xff, 8,9,10,11, 0xff, 0xff, 12,13,14,15, 0xff, 0xff}; + _NEON2SSE_ALIGN_16 uint8_t mask2lo[16] = {0xff,0xff, 0xff,0xff, 0,1, 0xff,0xff, 0xff,0xff, 2,3, 0xff,0xff, 0xff,0xff}; + _NEON2SSE_ALIGN_16 uint8_t mask2med[16] = {4,5, 0xff,0xff,0xff,0xff, 6,7, 0xff, 0xff,0xff,0xff, 8,9, 0xff, 0xff}; + _NEON2SSE_ALIGN_16 uint8_t mask2hi[16] = {0xff, 0xff, 10,11, 0xff, 0xff, 0xff, 0xff, 12,13, 0xff, 0xff, 0xff, 0xff,14,15}; + + v0 = _mm_unpacklo_epi16(val->val[0], val->val[1]); //0,1, 3,4, 6,7, 9,10 + v2 = _mm_unpackhi_epi16(val->val[0], val->val[1]); //12,13, 15,16, 18,19, 21,22, + v1 = _mm_alignr_epi8(v2, v0, 12); //9,10, 12,13, 15,16, 18,19 + v.val[0] = _mm_shuffle_epi8(v0, *(__m128i*)mask0); //make holes for the v.val[2] data embedding + v.val[2] = _mm_shuffle_epi8(val->val[2], *(__m128i*)mask2lo); //make plugs for the v.val[2] data embedding + cff = _mm_cmpeq_epi16(v0, v0); //all ff + bldmask = _mm_cmpeq_epi16(*(__m128i*)mask0, cff); + v.val[0] = _MM_BLENDV_EPI8(v.val[0], v.val[2], bldmask); + vst1q_u16(ptr, v.val[0]); + v.val[0] = _mm_shuffle_epi8(v1, *(__m128i*)mask1); //make holes for the v.val[2] data embedding + v.val[2] = _mm_shuffle_epi8(val->val[2], *(__m128i*)mask2med); //make plugs for the v.val[2] data embedding + bldmask = _mm_cmpeq_epi16(*(__m128i*)mask1, cff); + v.val[1] = _MM_BLENDV_EPI8(v.val[0],v.val[2], bldmask); + vst1q_u16((ptr + 8), v.val[1]); + v.val[0] = _mm_shuffle_epi8(v2, *(__m128i*)mask2); //make holes for the v.val[2] data embedding + v.val[2] = _mm_shuffle_epi8(val->val[2], *(__m128i*)mask2hi); //make plugs for the v.val[2] data embedding + bldmask = _mm_cmpeq_epi16(*(__m128i*)mask2, cff); + v.val[2] = _MM_BLENDV_EPI8(v.val[0],v.val[2], bldmask ); + vst1q_u16((ptr + 16), v.val[2]); +} +#define vst3q_u16(ptr, val) vst3q_u16_ptr(ptr, &val) + +//void vst3q_u32(__transfersize(12) uint32_t * ptr, uint32x4x3_t val)// VST3.32 {d0, d2, d4}, [r0] +_NEON2SSE_INLINE void vst3q_u32_ptr(__transfersize(12) uint32_t * ptr, uint32x4x3_t* val) +{ + //a0,a1,a2,a3, b0,b1,b2,b3, c0,c1,c2,c3 -> a0,b0,c0,a1, b1,c1,a2,b2, c2,a3,b3,c3 + uint32x4x3_t v; + __m128i tmp0, tmp1,tmp2; + tmp0 = _mm_unpacklo_epi32(val->val[0], val->val[1]); //a0,b0,a1,b1 + tmp1 = _mm_unpackhi_epi32(val->val[0], val->val[1]); //a2,b2,a3,b3 + tmp2 = _mm_unpacklo_epi32(val->val[1], val->val[2]); //b0,c0,b1,c1 + v.val[1] = _mm_castps_si128(_mm_shuffle_ps(_mm_castsi128_ps(tmp2),_mm_castsi128_ps(tmp1), _MM_SHUFFLE(1,0,3,2))); //b1,c1,a2,b2, + v.val[2] = _mm_unpackhi_epi64(tmp1, val->val[2]); //a3,b3, c2,c3 + v.val[2] = _mm_shuffle_epi32(v.val[2], 2 | (0 << 2) | (1 << 4) | (3 << 6)); //c2,a3,b3,c3 + tmp1 = _mm_unpacklo_epi32(tmp2,val->val[0]); //b0,a0,c0,a1 + v.val[0] = _mm_castps_si128(_mm_shuffle_ps(_mm_castsi128_ps(tmp0),_mm_castsi128_ps(tmp1), _MM_SHUFFLE(3,2,1,0))); //a0,b0,c0,a1, + + vst1q_u32(ptr, v.val[0]); + vst1q_u32((ptr + 4), v.val[1]); + vst1q_u32((ptr + 8), v.val[2]); +} +#define vst3q_u32(ptr, val) vst3q_u32_ptr(ptr, &val) + +//void vst3q_s8(__transfersize(48) int8_t * ptr, int8x16x3_t val); +void vst3q_s8_ptr(__transfersize(48) int8_t * ptr, int8x16x3_t * val); +#define vst3q_s8(ptr, val) vst3q_u8((uint8_t*)(ptr), val) + +//void vst3q_s16(__transfersize(24) int16_t * ptr, int16x8x3_t val); +void vst3q_s16_ptr(__transfersize(24) int16_t * ptr, int16x8x3_t * val); +#define vst3q_s16(ptr, val) vst3q_u16((uint16_t*)(ptr), val) + +//void vst3q_s32(__transfersize(12) int32_t * ptr, int32x4x3_t val); +void vst3q_s32_ptr(__transfersize(12) int32_t * ptr, int32x4x3_t * val); +#define vst3q_s32(ptr, val) vst3q_u32((uint32_t*)(ptr), val) + +//void vst3q_f16(__transfersize(24) __fp16 * ptr, float16x8x3_t val);// VST3.16 {d0, d2, d4}, [r0] +void vst3q_f16_ptr(__transfersize(24) __fp16 * ptr, float16x8x3_t * val); +// IA32 SIMD doesn't work with 16bit floats currently + +//void vst3q_f32(__transfersize(12) float32_t * ptr, float32x4x3_t val)// VST3.32 {d0, d2, d4}, [r0] +_NEON2SSE_INLINE void vst3q_f32_ptr(__transfersize(12) float32_t * ptr, float32x4x3_t* val) +{ + float32x4x3_t v; + __m128 tmp0, tmp1,tmp2; + tmp0 = _mm_unpacklo_ps(val->val[0], val->val[1]); //a0,b0,a1,b1 + tmp1 = _mm_unpackhi_ps(val->val[0], val->val[1]); //a2,b2,a3,b3 + tmp2 = _mm_unpacklo_ps(val->val[1], val->val[2]); //b0,c0,b1,c1 + v.val[1] = _mm_shuffle_ps(tmp2,tmp1, _MM_SHUFFLE(1,0,3,2)); //b1,c1,a2,b2, + v.val[2] = _mm_movehl_ps(val->val[2],tmp1); //a3,b3, c2,c3 + v.val[2] = _mm_shuffle_ps(v.val[2],v.val[2], _MM_SHUFFLE(3,1,0,2)); //c2,a3,b3,c3 + tmp1 = _mm_unpacklo_ps(tmp2,val->val[0]); //b0,a0,c0,a1 + v.val[0] = _mm_shuffle_ps(tmp0,tmp1, _MM_SHUFFLE(3,2,1,0)); //a0,b0,c0,a1, + + vst1q_f32( ptr, v.val[0]); + vst1q_f32( (ptr + 4), v.val[1]); + vst1q_f32( (ptr + 8), v.val[2]); +} +#define vst3q_f32(ptr, val) vst3q_f32_ptr(ptr, &val) + +//void vst3q_p8(__transfersize(48) poly8_t * ptr, poly8x16x3_t val);// VST3.8 {d0, d2, d4}, [r0] +void vst3q_p8_ptr(__transfersize(48) poly8_t * ptr, poly8x16x3_t * val); +#define vst3q_p8 vst3q_u8 + +//void vst3q_p16(__transfersize(24) poly16_t * ptr, poly16x8x3_t val);// VST3.16 {d0, d2, d4}, [r0] +void vst3q_p16_ptr(__transfersize(24) poly16_t * ptr, poly16x8x3_t * val); +#define vst3q_p16 vst3q_u16 + +//void vst3_u8(__transfersize(24) uint8_t * ptr, uint8x8x3_t val)// VST3.8 {d0, d1, d2}, [r0] +_NEON2SSE_INLINE void vst3_u8_ptr(__transfersize(24) uint8_t * ptr, uint8x8x3_t* val) +{ + __m128i tmp, sh0, sh1, val0, val2; + _NEON2SSE_ALIGN_16 int8_t mask0[16] = { 0, 8, 16, 1, 9, 17, 2, 10, 18, 3, 11, 19, 4, 12, 20, 5}; + _NEON2SSE_ALIGN_16 int8_t mask1[16] = {13, 21, 6, 14, 22, 7, 15, 23, 0,0,0,0,0,0,0,0}; + _NEON2SSE_ALIGN_16 int8_t mask0_sel[16] = {0, 0, 0xff, 0, 0, 0xff, 0, 0, 0xff, 0, 0, 0xff, 0, 0, 0xff, 0}; + _NEON2SSE_ALIGN_16 int8_t mask1_sel[16] = {0, 0xff, 0, 0, 0xff, 0, 0, 0xff, 0,0,0,0,0,0,0,0}; + tmp = _mm_unpacklo_epi64(_pM128i(val->val[0]), _pM128i(val->val[1]) ); + sh0 = _mm_shuffle_epi8(tmp, *(__m128i*)mask0); //for bi>15 bi is wrapped (bi-=15) + val2 = _pM128i(val->val[2]); + sh1 = _mm_shuffle_epi8(val2, *(__m128i*)mask0); + val0 = _MM_BLENDV_EPI8(sh0, sh1, *(__m128i*)mask0_sel); + vst1q_u8(ptr, val0); //store as 128 bit structure + sh0 = _mm_shuffle_epi8(tmp, *(__m128i*)mask1); //for bi>15 bi is wrapped (bi-=15) + sh1 = _mm_shuffle_epi8(val2, *(__m128i*)mask1); + val2 = _MM_BLENDV_EPI8(sh0, sh1, *(__m128i*)mask1_sel); + _M64((*(__m64_128*)(ptr + 16)), val2); //need it to fit into *ptr memory +} +#define vst3_u8(ptr, val) vst3_u8_ptr(ptr, &val) + +//void vst3_u16(__transfersize(12) uint16_t * ptr, uint16x4x3_t val)// VST3.16 {d0, d1, d2}, [r0] +_NEON2SSE_INLINE void vst3_u16_ptr(__transfersize(12) uint16_t * ptr, uint16x4x3_t* val) +{ + __m128i tmp, val0, val1, val2; + _NEON2SSE_ALIGN_16 int8_t mask0[16] = {0,1, 8,9, 16,17, 2,3, 10,11, 18,19, 4,5, 12,13}; + _NEON2SSE_ALIGN_16 int8_t mask1[16] = {20,21, 6,7, 14,15, 22,23, 0,0,0,0,0,0,0,0}; + _NEON2SSE_ALIGN_16 uint16_t mask0f[8] = {0xffff, 0xffff, 0, 0xffff, 0xffff, 0, 0xffff, 0xffff}; //if all ones we take the result from v.val[0] otherwise from v.val[1] + _NEON2SSE_ALIGN_16 uint16_t mask1f[8] = {0xffff, 0, 0, 0xffff, 0xffff, 0xffff, 0xffff, 0xffff}; //if all ones we take the result from v.val[1] otherwise from v.val[0] + tmp = _mm_unpacklo_epi64(_pM128i(val->val[0]), _pM128i(val->val[1])); + val0 = _mm_shuffle_epi8(tmp, *(__m128i*)mask0); + val2 = _pM128i(val->val[2]); + val1 = _mm_shuffle_epi8(val2, *(__m128i*)mask0); + val0 = _MM_BLENDV_EPI8(val1, val0, *(__m128i*)mask0f); + vst1q_u16(ptr, val0); //store as 128 bit structure + val0 = _mm_shuffle_epi8(tmp, *(__m128i*)mask1); + val1 = _mm_shuffle_epi8(val2, *(__m128i*)mask1); + val1 = _MM_BLENDV_EPI8(val0, val1, *(__m128i*)mask1f); //change the operands order + _M64((*(__m64_128*)(ptr + 8)), val1); //need it to fit into *ptr memory +} +#define vst3_u16(ptr, val) vst3_u16_ptr(ptr, &val) + +//void vst3_u32(__transfersize(6) uint32_t * ptr, uint32x2x3_t val)// VST3.32 {d0, d1, d2}, [r0] +_NEON2SSE_INLINE void vst3_u32_ptr(__transfersize(6) uint32_t * ptr, uint32x2x3_t* val) +{ + //val->val[0]:0,3,val->val[1]:1,4; val->val[2]:2,5,x,x; + __m128i val0, val1; + val0 = _mm_unpacklo_epi64(_pM128i(val->val[1]), _pM128i(val->val[2])); //val[0]: 1,4,2,5 + val0 = _mm_shuffle_epi32(val0, 0 | (2 << 2) | (1 << 4) | (3 << 6)); //1,2,4,5 + val1 = _mm_srli_si128(val0, 8); //4,5, x,x + _M64((*(__m64_128*)(ptr + 4)), val1); + val0 = _mm_unpacklo_epi32(_pM128i(val->val[0]), val0); //0,1,3,2 + val0 = _mm_shuffle_epi32(val0, 0 | (1 << 2) | (3 << 4) | (2 << 6)); //0,1,2, 3 + vst1q_u32(ptr, val0); //store as 128 bit structure +} +#define vst3_u32(ptr, val) vst3_u32_ptr(ptr, &val) + +//void vst3_u64(__transfersize(3) uint64_t * ptr, uint64x1x3_t val)// VST1.64 {d0, d1, d2}, [r0] +_NEON2SSE_INLINE void vst3_u64_ptr(__transfersize(3) uint64_t * ptr, uint64x1x3_t* val) +{ + *(ptr) = val->val[0].m64_u64[0]; + *(ptr + 1) = val->val[1].m64_u64[0]; + *(ptr + 2) = val->val[2].m64_u64[0]; +} +#define vst3_u64(ptr, val) vst3_u64_ptr(ptr, &val) + +//void vst3_s8(__transfersize(24) int8_t * ptr, int8x8x3_t val) // VST3.8 {d0, d1, d2}, [r0] +#define vst3_s8(ptr, val) vst3_u8_ptr((uint8_t*)ptr, &val) + +//void vst3_s16(__transfersize(12) int16_t * ptr, int16x4x3_t val) // VST3.16 {d0, d1, d2}, [r0] +#define vst3_s16(ptr, val) vst3_u16_ptr((uint16_t*)ptr, &val) + +//void vst3_s32(__transfersize(6) int32_t * ptr, int32x2x3_t val); // VST3.32 {d0, d1, d2}, [r0] +#define vst3_s32(ptr, val) vst3_u32_ptr((uint32_t*)ptr, &val) + +//void vst3_s64(__transfersize(3) int64_t * ptr, int64x1x3_t val) // VST1.64 {d0, d1, d2}, [r0] +#define vst3_s64(ptr, val) vst3_u64_ptr((uint64_t*)ptr, &val) + +//void vst3_f16(__transfersize(12) __fp16 * ptr, float16x4x3_t val);// VST3.16 {d0, d1, d2}, [r0] +void vst3_f16_ptr(__transfersize(12) __fp16 * ptr, float16x4x3_t * val); // VST3.16 {d0, d1, d2}, [r0] +// IA32 SIMD doesn't work with 16bit floats currently, so need to go to 32 bit and then work with two 128bit registers. See vld1q_f16 for example + +//void vst3_f32(__transfersize(6) float32_t * ptr, float32x2x3_t val)// VST3.32 {d0, d1, d2}, [r0] +_NEON2SSE_INLINE void vst3_f32_ptr(__transfersize(6) float32_t * ptr, float32x2x3_t* val) +{ + //val->val[0]:0,3,val->val[1]:1,4; val->val[2]:2,5,x,x; -> 0,2, 4,1, 3,5 + *(ptr) = val->val[0].m64_f32[0]; + *(ptr + 1) = val->val[1].m64_f32[0]; + *(ptr + 2) = val->val[2].m64_f32[0]; + *(ptr + 3) = val->val[0].m64_f32[1]; + *(ptr + 4) = val->val[1].m64_f32[1]; + *(ptr + 5) = val->val[2].m64_f32[1]; +} +#define vst3_f32(ptr, val) vst3_f32_ptr(ptr, &val) + +//void vst3_p8(__transfersize(24) poly8_t * ptr, poly8x8x3_t val);// VST3.8 {d0, d1, d2}, [r0] +void vst3_p8_ptr(__transfersize(24) poly8_t * ptr, poly8x8x3_t * val); +#define vst3_p8 vst3_u8 + +//void vst3_p16(__transfersize(12) poly16_t * ptr, poly16x4x3_t val);// VST3.16 {d0, d1, d2}, [r0] +void vst3_p16_ptr(__transfersize(12) poly16_t * ptr, poly16x4x3_t * val); +#define vst3_p16 vst3_s16 + +//*************** Quadruples store ******************************** +//********************************************************************* +//void vst4q_u8(__transfersize(64) uint8_t * ptr, uint8x16x4_t val)// VST4.8 {d0, d2, d4, d6}, [r0] +_NEON2SSE_INLINE void vst4q_u8_ptr(__transfersize(64) uint8_t * ptr, uint8x16x4_t* val) +{ + __m128i tmp1, tmp2, res; + tmp1 = _mm_unpacklo_epi8(val->val[0], val->val[1]); // 0,1, 4,5, 8,9, 12,13, 16,17, 20,21, 24,25, 28,29 + tmp2 = _mm_unpacklo_epi8(val->val[2], val->val[3]); // 2,3, 6,7, 10,11, 14,15, 18,19, 22,23, 26,27, 30,31 + res = _mm_unpacklo_epi16(tmp1, tmp2); //0,1, 2,3, 4,5, 6,7, 8,9, 10,11, 12,13, 14,15 + vst1q_u8(ptr, res); + res = _mm_unpackhi_epi16(tmp1, tmp2); //16,17, 18,19, 20,21, 22,23, 24,25, 26,27, 28,29, 30,31 + vst1q_u8((ptr + 16), res); + tmp1 = _mm_unpackhi_epi8(val->val[0], val->val[1]); // + tmp2 = _mm_unpackhi_epi8(val->val[2], val->val[3]); // + res = _mm_unpacklo_epi16(tmp1, tmp2); // + vst1q_u8((ptr + 32), res); + res = _mm_unpackhi_epi16(tmp1, tmp2); // + vst1q_u8((ptr + 48), res); +} +#define vst4q_u8(ptr, val) vst4q_u8_ptr(ptr, &val) + +//void vst4q_u16(__transfersize(32) uint16_t * ptr, uint16x8x4_t val)// VST4.16 {d0, d2, d4, d6}, [r0] +_NEON2SSE_INLINE void vst4q_u16_ptr(__transfersize(32) uint16_t * ptr, uint16x8x4_t* val) +{ + uint16x8x4_t v; + __m128i tmp1, tmp2; + tmp1 = _mm_unpacklo_epi16(val->val[0], val->val[1]); //0,1, 4,5, 8,9, 12,13 + tmp2 = _mm_unpacklo_epi16(val->val[2], val->val[3]); //2,3, 6,7 , 10,11, 14,15 + v.val[0] = _mm_unpacklo_epi32(tmp1, tmp2); + v.val[1] = _mm_unpackhi_epi32(tmp1, tmp2); + tmp1 = _mm_unpackhi_epi16(val->val[0], val->val[1]); //0,1, 4,5, 8,9, 12,13 + tmp2 = _mm_unpackhi_epi16(val->val[2], val->val[3]); //2,3, 6,7 , 10,11, 14,15 + v.val[2] = _mm_unpacklo_epi32(tmp1, tmp2); + v.val[3] = _mm_unpackhi_epi32(tmp1, tmp2); + vst1q_u16(ptr, v.val[0]); + vst1q_u16((ptr + 8), v.val[1]); + vst1q_u16((ptr + 16),v.val[2]); + vst1q_u16((ptr + 24), v.val[3]); +} +#define vst4q_u16(ptr, val) vst4q_u16_ptr(ptr, &val) + +//void vst4q_u32(__transfersize(16) uint32_t * ptr, uint32x4x4_t val)// VST4.32 {d0, d2, d4, d6}, [r0] +_NEON2SSE_INLINE void vst4q_u32_ptr(__transfersize(16) uint32_t * ptr, uint32x4x4_t* val) +{ + uint16x8x4_t v; + __m128i tmp1, tmp2; + tmp1 = _mm_unpacklo_epi32(val->val[0], val->val[1]); //0,1, 4,5, 8,9, 12,13 + tmp2 = _mm_unpacklo_epi32(val->val[2], val->val[3]); //2,3, 6,7 , 10,11, 14,15 + v.val[0] = _mm_unpacklo_epi64(tmp1, tmp2); + v.val[1] = _mm_unpackhi_epi64(tmp1, tmp2); + tmp1 = _mm_unpackhi_epi32(val->val[0], val->val[1]); //0,1, 4,5, 8,9, 12,13 + tmp2 = _mm_unpackhi_epi32(val->val[2], val->val[3]); //2,3, 6,7 , 10,11, 14,15 + v.val[2] = _mm_unpacklo_epi64(tmp1, tmp2); + v.val[3] = _mm_unpackhi_epi64(tmp1, tmp2); + vst1q_u32(ptr, v.val[0]); + vst1q_u32((ptr + 4), v.val[1]); + vst1q_u32((ptr + 8), v.val[2]); + vst1q_u32((ptr + 12), v.val[3]); +} +#define vst4q_u32(ptr, val) vst4q_u32_ptr(ptr, &val) + +//void vst4q_s8(__transfersize(64) int8_t * ptr, int8x16x4_t val); +void vst4q_s8_ptr(__transfersize(64) int8_t * ptr, int8x16x4_t * val); +#define vst4q_s8(ptr, val) vst4q_u8((uint8_t*)(ptr), val) + +//void vst4q_s16(__transfersize(32) int16_t * ptr, int16x8x4_t val); +void vst4q_s16_ptr(__transfersize(32) int16_t * ptr, int16x8x4_t * val); +#define vst4q_s16(ptr, val) vst4q_u16((uint16_t*)(ptr), val) + +//void vst4q_s32(__transfersize(16) int32_t * ptr, int32x4x4_t val); +void vst4q_s32_ptr(__transfersize(16) int32_t * ptr, int32x4x4_t * val); +#define vst4q_s32(ptr, val) vst4q_u32((uint32_t*)(ptr), val) + +//void vst4q_f16(__transfersize(32) __fp16 * ptr, float16x8x4_t val);// VST4.16 {d0, d2, d4, d6}, [r0] +void vst4q_f16_ptr(__transfersize(32) __fp16 * ptr, float16x8x4_t * val); +// IA32 SIMD doesn't work with 16bit floats currently + +//void vst4q_f32(__transfersize(16) float32_t * ptr, float32x4x4_t val)// VST4.32 {d0, d2, d4, d6}, [r0] +_NEON2SSE_INLINE void vst4q_f32_ptr(__transfersize(16) float32_t * ptr, float32x4x4_t* val) +{ + __m128 tmp3, tmp2, tmp1, tmp0; + float32x4x4_t v; + tmp0 = _mm_unpacklo_ps(val->val[0], val->val[1]); + tmp2 = _mm_unpacklo_ps(val->val[2], val->val[3]); + tmp1 = _mm_unpackhi_ps(val->val[0], val->val[1]); + tmp3 = _mm_unpackhi_ps(val->val[2], val->val[3]); + v.val[0] = _mm_movelh_ps(tmp0, tmp2); + v.val[1] = _mm_movehl_ps(tmp2, tmp0); + v.val[2] = _mm_movelh_ps(tmp1, tmp3); + v.val[3] = _mm_movehl_ps(tmp3, tmp1); + vst1q_f32(ptr, v.val[0]); + vst1q_f32((ptr + 4), v.val[1]); + vst1q_f32((ptr + 8), v.val[2]); + vst1q_f32((ptr + 12), v.val[3]); +} +#define vst4q_f32(ptr, val) vst4q_f32_ptr(ptr, &val) + +//void vst4q_p8(__transfersize(64) poly8_t * ptr, poly8x16x4_t val);// VST4.8 {d0, d2, d4, d6}, [r0] +void vst4q_p8_ptr(__transfersize(64) poly8_t * ptr, poly8x16x4_t * val); +#define vst4q_p8 vst4q_u8 + +//void vst4q_p16(__transfersize(32) poly16_t * ptr, poly16x8x4_t val);// VST4.16 {d0, d2, d4, d6}, [r0] +void vst4q_p16_ptr(__transfersize(32) poly16_t * ptr, poly16x8x4_t * val); +#define vst4q_p16 vst4q_s16 + +//void vst4_u8(__transfersize(32) uint8_t * ptr, uint8x8x4_t val)// VST4.8 {d0, d1, d2, d3}, [r0] +_NEON2SSE_INLINE void vst4_u8_ptr(__transfersize(32) uint8_t * ptr, uint8x8x4_t* val) +{ + __m128i sh0, sh1, val0, val2; + sh0 = _mm_unpacklo_epi8(_pM128i(val->val[0]),_pM128i(val->val[1])); // a0,b0,a1,b1,a2,b2,a3,b3,a4,b4,a5,b5, a6,b6,a7,b7, + sh1 = _mm_unpacklo_epi8(_pM128i(val->val[2]),_pM128i(val->val[3])); // c0,d0,c1,d1,c2,d2,c3,d3, c4,d4,c5,d5,c6,d6,c7,d7 + val0 = _mm_unpacklo_epi16(sh0,sh1); // a0,b0,c0,d0,a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3, + val2 = _mm_unpackhi_epi16(sh0,sh1); //a4,b4,c4,d4,a5,b5,c5,d5, a6,b6,c6,d6,a7,b7,c7,d7 + vst1q_u8(ptr, val0); + vst1q_u8((ptr + 16), val2); +} +#define vst4_u8(ptr, val) vst4_u8_ptr(ptr, &val) + +//void vst4_u16(__transfersize(16) uint16_t * ptr, uint16x4x4_t val)// VST4.16 {d0, d1, d2, d3}, [r0] +_NEON2SSE_INLINE void vst4_u16_ptr(__transfersize(16) uint16_t * ptr, uint16x4x4_t* val) +{ + __m128i sh0, sh1, val0, val2; + sh0 = _mm_unpacklo_epi16(_pM128i(val->val[0]),_pM128i(val->val[1])); //a0,a1,b0,b1,c0,c1,d0,d1, + sh1 = _mm_unpacklo_epi16(_pM128i(val->val[2]),_pM128i(val->val[3])); //a2,a3,b2,b3,c2,c3,d2,d3 + val0 = _mm_unpacklo_epi32(sh0,sh1); // a0,a1,a2,a3,b0,b1,b2,b3 + val2 = _mm_unpackhi_epi32(sh0,sh1); // c0,c1,c2,c3,d0,d1,d2,d3 + vst1q_u16(ptr, val0); //store as 128 bit structure + vst1q_u16((ptr + 8), val2); +} +#define vst4_u16(ptr, val) vst4_u16_ptr(ptr, &val) + +//void vst4_u32(__transfersize(8) uint32_t * ptr, uint32x2x4_t val)// VST4.32 {d0, d1, d2, d3}, [r0] +_NEON2SSE_INLINE void vst4_u32_ptr(__transfersize(8) uint32_t * ptr, uint32x2x4_t* val) +{ + //0,4, 1,5, 2,6, 3,7 + __m128i sh0, sh1, val0, val1; + sh0 = _mm_unpacklo_epi32(_pM128i(val->val[0]), _pM128i(val->val[1])); //0,1,4,5 + sh1 = _mm_unpacklo_epi32(_pM128i(val->val[2]), _pM128i(val->val[3])); //2,3,6,7 + val0 = _mm_unpacklo_epi64(sh0,sh1); // + val1 = _mm_unpackhi_epi64(sh0,sh1); // + vst1q_u32(ptr, val0); //store as 128 bit structure + vst1q_u32((ptr + 4), val1); +} +#define vst4_u32(ptr, val) vst4_u32_ptr(ptr, &val) + +//void vst4_u64(__transfersize(4) uint64_t * ptr, uint64x1x4_t val)// VST1.64 {d0, d1, d2, d3}, [r0] +_NEON2SSE_INLINE void vst4_u64_ptr(__transfersize(4) uint64_t * ptr, uint64x1x4_t* val) +{ + *(ptr) = val->val[0].m64_u64[0]; + *(ptr + 1) = val->val[1].m64_u64[0]; + *(ptr + 2) = val->val[2].m64_u64[0]; + *(ptr + 3) = val->val[3].m64_u64[0]; +} +#define vst4_u64(ptr, val) vst4_u64_ptr(ptr, &val) + +//void vst4_s8(__transfersize(32) int8_t * ptr, int8x8x4_t val) //VST4.8 {d0, d1, d2, d3}, [r0] +#define vst4_s8(ptr, val) vst4_u8((uint8_t*)ptr, val) + +//void vst4_s16(__transfersize(16) int16_t * ptr, int16x4x4_t val) // VST4.16 {d0, d1, d2, d3}, [r0] +#define vst4_s16(ptr, val) vst4_u16((uint16_t*)ptr, val) + +//void vst4_s32(__transfersize(8) int32_t * ptr, int32x2x4_t val) // VST4.32 {d0, d1, d2, d3}, [r0] +#define vst4_s32(ptr, val) vst4_u32((uint32_t*)ptr, val) + +//void vst4_s64(__transfersize(4) int64_t * ptr, int64x1x4_t val); // VST1.64 {d0, d1, d2, d3}, [r0] +void vst4_s64_ptr(__transfersize(4) int64_t * ptr, int64x1x4_t * val); +#define vst4_s64(ptr, val) vst4_u64((uint64_t*)ptr, val) + +//void vst4_f16(__transfersize(16) __fp16 * ptr, float16x4x4_t val);// VST4.16 {d0, d1, d2, d3}, [r0] +void vst4_f16_ptr(__transfersize(16) __fp16 * ptr, float16x4x4_t * val); +// IA32 SIMD doesn't work with 16bit floats currently, so need to go to 32 bit and then work with two 128bit registers. See vld1q_f16 for example + +//void vst4_f32(__transfersize(8) float32_t * ptr, float32x2x4_t val)// VST4.32 {d0, d1, d2, d3}, [r0] +_NEON2SSE_INLINE void vst4_f32_ptr(__transfersize(8) float32_t * ptr, float32x2x4_t* val) +{ + //0,4, 1,5, 2,6, 3,7 -> 0,1, 2,3, 4,5, 6,7 + *(ptr) = val->val[0].m64_f32[0]; + *(ptr + 1) = val->val[1].m64_f32[0]; + *(ptr + 2) = val->val[2].m64_f32[0]; + *(ptr + 3) = val->val[3].m64_f32[0]; + *(ptr + 4) = val->val[0].m64_f32[1]; + *(ptr + 5) = val->val[1].m64_f32[1]; + *(ptr + 6) = val->val[2].m64_f32[1]; + *(ptr + 7) = val->val[3].m64_f32[1]; +} +#define vst4_f32(ptr, val) vst4_f32_ptr(ptr, &val) + +//void vst4_p8(__transfersize(32) poly8_t * ptr, poly8x8x4_t val);// VST4.8 {d0, d1, d2, d3}, [r0] +void vst4_p8_ptr(__transfersize(32) poly8_t * ptr, poly8x8x4_t * val); +#define vst4_p8 vst4_u8 + +//void vst4_p16(__transfersize(16) poly16_t * ptr, poly16x4x4_t val);// VST4.16 {d0, d1, d2, d3}, [r0] +void vst4_p16_ptr(__transfersize(16) poly16_t * ptr, poly16x4x4_t * val); +#define vst4_p16 vst4_u16 + +//*********** Store a lane of a vector into memory (extract given lane) for a couple of vectors ********************* +//******************************************************************************************************************** +//void vst2q_lane_u16(__transfersize(2) uint16_t * ptr, uint16x8x2_t val, __constrange(0,7) int lane)// VST2.16 {d0[0], d2[0]}, [r0] +_NEON2SSE_INLINE void vst2q_lane_u16_ptr(__transfersize(2) uint16_t * ptr, uint16x8x2_t* val, __constrange(0,7) int lane) +{ + vst1q_lane_s16(ptr, val->val[0], lane); + vst1q_lane_s16((ptr + 1), val->val[1], lane); +} +#define vst2q_lane_u16(ptr, val, lane) vst2q_lane_u16_ptr(ptr, &val, lane) + +//void vst2q_lane_u32(__transfersize(2) uint32_t * ptr, uint32x4x2_t val, __constrange(0,3) int lane)// VST2.32 {d0[0], d2[0]}, [r0] +_NEON2SSE_INLINE void vst2q_lane_u32_ptr(__transfersize(2) uint32_t* ptr, uint32x4x2_t* val, __constrange(0,3) int lane) +{ + vst1q_lane_u32(ptr, val->val[0], lane); + vst1q_lane_u32((ptr + 1), val->val[1], lane); +} +#define vst2q_lane_u32(ptr, val, lane) vst2q_lane_u32_ptr(ptr, &val, lane) + +//void vst2q_lane_s16(__transfersize(2) int16_t * ptr, int16x8x2_t val, __constrange(0,7) int lane);// VST2.16 {d0[0], d2[0]}, [r0] +void vst2q_lane_s16_ptr(__transfersize(2) int16_t * ptr, int16x8x2_t * val, __constrange(0,7) int lane); +#define vst2q_lane_s16(ptr, val, lane) vst2q_lane_u16((uint16_t*)ptr, val, lane) + +//void vst2q_lane_s32(__transfersize(2) int32_t * ptr, int32x4x2_t val, __constrange(0,3) int lane);// VST2.32 {d0[0], d2[0]}, [r0] +void vst2q_lane_s32_ptr(__transfersize(2) int32_t * ptr, int32x4x2_t * val, __constrange(0,3) int lane); +#define vst2q_lane_s32(ptr, val, lane) vst2q_lane_u32((uint32_t*)ptr, val, lane) + +//void vst2q_lane_f16(__transfersize(2) __fp16 * ptr, float16x8x2_t val, __constrange(0,7) int lane);// VST2.16 {d0[0], d2[0]}, [r0] +void vst2q_lane_f16_ptr(__transfersize(2) __fp16 * ptr, float16x8x2_t * val, __constrange(0,7) int lane); +//current IA SIMD doesn't support float16 + +//void vst2q_lane_f32(__transfersize(2) float32_t * ptr, float32x4x2_t val, __constrange(0,3) int lane)// VST2.32 {d0[0], d2[0]}, [r0] +_NEON2SSE_INLINE void vst2q_lane_f32_ptr(__transfersize(2) float32_t* ptr, float32x4x2_t* val, __constrange(0,3) int lane) +{ + vst1q_lane_f32(ptr, val->val[0], lane); + vst1q_lane_f32((ptr + 1), val->val[1], lane); +} +#define vst2q_lane_f32(ptr,src,lane) vst2q_lane_f32_ptr(ptr,&src,lane) + +//void vst2q_lane_p16(__transfersize(2) poly16_t * ptr, poly16x8x2_t val, __constrange(0,7) int lane);// VST2.16 {d0[0], d2[0]}, [r0] +void vst2q_lane_p16_ptr(__transfersize(2) poly16_t * ptr, poly16x8x2_t * val, __constrange(0,7) int lane); +#define vst2q_lane_p16 vst2q_lane_s16 + +//void vst2_lane_u8(__transfersize(2) uint8_t * ptr, uint8x8x2_t val, __constrange(0,7) int lane);// VST2.8 {d0[0], d1[0]}, [r0] +void vst2_lane_u8_ptr(__transfersize(2) uint8_t * ptr, uint8x8x2_t * val, __constrange(0,7) int lane); // VST2.8 {d0[0], d1[0]}, [r0] +_NEON2SSE_INLINE void vst2_lane_u8_ptr(__transfersize(2) uint8_t * ptr, uint8x8x2_t* val, __constrange(0,7) int lane) // VST2.8 {d0[0], d1[0]}, [r0] +{ + *(ptr) = val->val[0].m64_u8[lane]; + *(ptr + 1) = val->val[1].m64_u8[lane]; +} +#define vst2_lane_u8(ptr, val, lane) vst2_lane_u8_ptr(ptr, &val, lane) + +//void vst2_lane_u16(__transfersize(2) uint16_t * ptr, uint16x4x2_t val, __constrange(0,3) int lane);// VST2.16 {d0[0], d1[0]}, [r0] +void vst2_lane_u16_ptr(__transfersize(2) uint16_t * ptr, uint16x4x2_t * val, __constrange(0,3) int lane); // VST2.16 {d0[0], d1[0]}, [r0] +_NEON2SSE_INLINE void vst2_lane_u16_ptr(__transfersize(2) uint16_t * ptr, uint16x4x2_t * val, __constrange(0,3) int lane) +{ + *(ptr) = val->val[0].m64_u16[lane]; + *(ptr + 1) = val->val[1].m64_u16[lane]; +} +#define vst2_lane_u16(ptr, val, lane) vst2_lane_u16_ptr(ptr, &val, lane) + +//void vst2_lane_u32(__transfersize(2) uint32_t * ptr, uint32x2x2_t val, __constrange(0,1) int lane);// VST2.32 {d0[0], d1[0]}, [r0] +void vst2_lane_u32_ptr(__transfersize(2) uint32_t * ptr, uint32x2x2_t * val, __constrange(0,1) int lane); // VST2.32 {d0[0], d1[0]}, [r0] +_NEON2SSE_INLINE void vst2_lane_u32_ptr(__transfersize(2) uint32_t * ptr, uint32x2x2_t * val, __constrange(0,1) int lane) +{ + *(ptr) = val->val[0].m64_u32[lane]; + *(ptr + 1) = val->val[1].m64_u32[lane]; +} +#define vst2_lane_u32(ptr, val, lane) vst2_lane_u32_ptr(ptr, &val, lane) + +//void vst2_lane_s8(__transfersize(2) int8_t * ptr, int8x8x2_t val, __constrange(0,7) int lane);// VST2.8 {d0[0], d1[0]}, [r0] +void vst2_lane_s8_ptr(__transfersize(2) int8_t * ptr, int8x8x2_t * val, __constrange(0,7) int lane); +#define vst2_lane_s8(ptr, val, lane) vst2_lane_u8((uint8_t*)ptr, val, lane) + +//void vst2_lane_s16(__transfersize(2) int16_t * ptr, int16x4x2_t val, __constrange(0,3) int lane);// VST2.16 {d0[0], d1[0]}, [r0] +void vst2_lane_s16_ptr(__transfersize(2) int16_t * ptr, int16x4x2_t * val, __constrange(0,3) int lane); +#define vst2_lane_s16(ptr, val, lane) vst2_lane_u16((uint16_t*)ptr, val, lane) + +//void vst2_lane_s32(__transfersize(2) int32_t * ptr, int32x2x2_t val, __constrange(0,1) int lane);// VST2.32 {d0[0], d1[0]}, [r0] +void vst2_lane_s32_ptr(__transfersize(2) int32_t * ptr, int32x2x2_t * val, __constrange(0,1) int lane); +#define vst2_lane_s32(ptr, val, lane) vst2_lane_u32((uint32_t*)ptr, val, lane) + +//void vst2_lane_f16(__transfersize(2) __fp16 * ptr, float16x4x2_t val, __constrange(0,3) int lane); // VST2.16 {d0[0], d1[0]}, [r0] +//current IA SIMD doesn't support float16 + +void vst2_lane_f32_ptr(__transfersize(2) float32_t * ptr, float32x2x2_t * val, __constrange(0,1) int lane); // VST2.32 {d0[0], d1[0]}, [r0] +_NEON2SSE_INLINE void vst2_lane_f32_ptr(__transfersize(2) float32_t * ptr, float32x2x2_t * val, __constrange(0,1) int lane) +{ + *(ptr) = val->val[0].m64_f32[lane]; + *(ptr + 1) = val->val[1].m64_f32[lane]; +} +#define vst2_lane_f32(ptr,src,lane) vst2_lane_f32_ptr(ptr,&src,lane) + +//void vst2_lane_p8(__transfersize(2) poly8_t * ptr, poly8x8x2_t val, __constrange(0,7) int lane);// VST2.8 {d0[0], d1[0]}, [r0] +#define vst2_lane_p8 vst2_lane_u8 + +//void vst2_lane_p16(__transfersize(2) poly16_t * ptr, poly16x4x2_t val, __constrange(0,3) int lane);// VST2.16 {d0[0], d1[0]}, [r0] +#define vst2_lane_p16 vst2_lane_u16 + +//************************* Triple lanes stores ******************************************************* +//******************************************************************************************************* +//void vst3q_lane_u16(__transfersize(3) uint16_t * ptr, uint16x8x3_t val, __constrange(0,7) int lane)// VST3.16 {d0[0], d2[0], d4[0]}, [r0] +_NEON2SSE_INLINE void vst3q_lane_u16_ptr(__transfersize(3) uint16_t * ptr, uint16x8x3_t* val, __constrange(0,7) int lane) +{ + vst2q_lane_u16_ptr(ptr, (uint16x8x2_t*)val, lane); + vst1q_lane_u16((ptr + 2), val->val[2], lane); +} +#define vst3q_lane_u16(ptr, val, lane) vst3q_lane_u16_ptr(ptr, &val, lane) + +//void vst3q_lane_u32(__transfersize(3) uint32_t * ptr, uint32x4x3_t val, __constrange(0,3) int lane)// VST3.32 {d0[0], d2[0], d4[0]}, [r0] +_NEON2SSE_INLINE void vst3q_lane_u32_ptr(__transfersize(3) uint32_t * ptr, uint32x4x3_t* val, __constrange(0,3) int lane) +{ + vst2q_lane_u32_ptr(ptr, (uint32x4x2_t*)val, lane); + vst1q_lane_u32((ptr + 2), val->val[2], lane); +} +#define vst3q_lane_u32(ptr, val, lane) vst3q_lane_u32_ptr(ptr, &val, lane) + +//void vst3q_lane_s16(__transfersize(3) int16_t * ptr, int16x8x3_t val, __constrange(0,7) int lane);// VST3.16 {d0[0], d2[0], d4[0]}, [r0] +void vst3q_lane_s16_ptr(__transfersize(3) int16_t * ptr, int16x8x3_t * val, __constrange(0,7) int lane); +#define vst3q_lane_s16(ptr, val, lane) vst3q_lane_u16((uint16_t *)ptr, val, lane) + +//void vst3q_lane_s32(__transfersize(3) int32_t * ptr, int32x4x3_t val, __constrange(0,3) int lane);// VST3.32 {d0[0], d2[0], d4[0]}, [r0] +void vst3q_lane_s32_ptr(__transfersize(3) int32_t * ptr, int32x4x3_t * val, __constrange(0,3) int lane); +#define vst3q_lane_s32(ptr, val, lane) vst3q_lane_u32((uint32_t *)ptr, val, lane) + +//void vst3q_lane_f16(__transfersize(3) __fp16 * ptr, float16x8x3_t val, __constrange(0,7) int lane);// VST3.16 {d0[0], d2[0], d4[0]}, [r0] +void vst3q_lane_f16_ptr(__transfersize(3) __fp16 * ptr, float16x8x3_t * val, __constrange(0,7) int lane); +//current IA SIMD doesn't support float16 + +//void vst3q_lane_f32(__transfersize(3) float32_t * ptr, float32x4x3_t val, __constrange(0,3) int lane)// VST3.32 {d0[0], d2[0], d4[0]}, [r0] +_NEON2SSE_INLINE void vst3q_lane_f32_ptr(__transfersize(3) float32_t * ptr, float32x4x3_t* val, __constrange(0,3) int lane) +{ + vst1q_lane_f32(ptr, val->val[0], lane); + vst1q_lane_f32((ptr + 1), val->val[1], lane); + vst1q_lane_f32((ptr + 2), val->val[2], lane); +} +#define vst3q_lane_f32(ptr,val,lane) vst3q_lane_f32_ptr(ptr,&val,lane) + +//void vst3q_lane_p16(__transfersize(3) poly16_t * ptr, poly16x8x3_t val, __constrange(0,7) int lane);// VST3.16 {d0[0], d2[0], d4[0]}, [r0] +void vst3q_lane_p16_ptr(__transfersize(3) poly16_t * ptr, poly16x8x3_t * val, __constrange(0,7) int lane); +#define vst3q_lane_p16 vst3q_lane_s16 + +//void vst3_lane_u8(__transfersize(3) uint8_t * ptr, uint8x8x3_t val, __constrange(0,7) int lane)// VST3.8 {d0[0], d1[0], d2[0]}, [r0] +_NEON2SSE_INLINE void vst3_lane_u8_ptr(__transfersize(3) uint8_t * ptr, uint8x8x3_t* val, __constrange(0,7) int lane) +{ + *(ptr) = val->val[0].m64_u8[lane]; + *(ptr + 1) = val->val[1].m64_u8[lane]; + *(ptr + 2) = val->val[2].m64_u8[lane]; +} +#define vst3_lane_u8(ptr, val, lane) vst3_lane_u8_ptr(ptr, &val, lane) + +//void vst3_lane_u16(__transfersize(3) uint16_t * ptr, uint16x4x3_t val, __constrange(0,3) int lane)// VST3.16 {d0[0], d1[0], d2[0]}, [r0] +_NEON2SSE_INLINE void vst3_lane_u16_ptr(__transfersize(3) uint16_t * ptr, uint16x4x3_t* val, __constrange(0,3) int lane) +{ + *(ptr) = val->val[0].m64_u16[lane]; + *(ptr + 1) = val->val[1].m64_u16[lane]; + *(ptr + 2) = val->val[2].m64_u16[lane]; +} +#define vst3_lane_u16(ptr, val, lane) vst3_lane_u16_ptr(ptr, &val, lane) + +//void vst3_lane_u32(__transfersize(3) uint32_t * ptr, uint32x2x3_t val, __constrange(0,1) int lane)// VST3.32 {d0[0], d1[0], d2[0]}, [r0] +_NEON2SSE_INLINE void vst3_lane_u32_ptr(__transfersize(3) uint32_t * ptr, uint32x2x3_t* val, __constrange(0,1) int lane) +{ + *(ptr) = val->val[0].m64_u32[lane]; + *(ptr + 1) = val->val[1].m64_u32[lane]; + *(ptr + 2) = val->val[2].m64_u32[lane]; +} +#define vst3_lane_u32(ptr, val, lane) vst3_lane_u32_ptr(ptr, &val, lane) + +//void vst3_lane_s8(__transfersize(3) int8_t * ptr, int8x8x3_t val, __constrange(0,7) int lane);// VST3.8 {d0[0], d1[0], d2[0]}, [r0] +void vst3_lane_s8_ptr(__transfersize(3) int8_t * ptr, int8x8x3_t * val, __constrange(0,7) int lane); +#define vst3_lane_s8(ptr, val, lane) vst3_lane_u8((uint8_t *)ptr, val, lane) + +//void vst3_lane_s16(__transfersize(3) int16_t * ptr, int16x4x3_t val, __constrange(0,3) int lane);// VST3.16 {d0[0], d1[0], d2[0]}, [r0] +void vst3_lane_s16_ptr(__transfersize(3) int16_t * ptr, int16x4x3_t * val, __constrange(0,3) int lane); +#define vst3_lane_s16(ptr, val, lane) vst3_lane_u16((uint16_t *)ptr, val, lane) + +//void vst3_lane_s32(__transfersize(3) int32_t * ptr, int32x2x3_t val, __constrange(0,1) int lane);// VST3.32 {d0[0], d1[0], d2[0]}, [r0] +void vst3_lane_s32_ptr(__transfersize(3) int32_t * ptr, int32x2x3_t * val, __constrange(0,1) int lane); +#define vst3_lane_s32(ptr, val, lane) vst3_lane_u32((uint32_t *)ptr, val, lane) + +//void vst3_lane_f16(__transfersize(3) __fp16 * ptr, float16x4x3_t val, __constrange(0,3) int lane);// VST3.16 {d0[0], d1[0], d2[0]}, [r0] +void vst3_lane_f16_ptr(__transfersize(3) __fp16 * ptr, float16x4x3_t * val, __constrange(0,3) int lane); +//current IA SIMD doesn't support float16 + +//void vst3_lane_f32(__transfersize(3) float32_t * ptr, float32x2x3_t val, __constrange(0,1) int lane)// VST3.32 {d0[0], d1[0], d2[0]}, [r0] +void vst3_lane_f32_ptr(__transfersize(3) float32_t * ptr, float32x2x3_t * val, __constrange(0,1) int lane); +_NEON2SSE_INLINE void vst3_lane_f32_ptr(__transfersize(3) float32_t * ptr, float32x2x3_t * val, __constrange(0,1) int lane) +{ + *(ptr) = val->val[0].m64_f32[lane]; + *(ptr + 1) = val->val[1].m64_f32[lane]; + *(ptr + 2) = val->val[2].m64_f32[lane]; +} +#define vst3_lane_f32(ptr,val,lane) vst3_lane_f32_ptr(ptr,&val,lane) + +//void vst3_lane_p8(__transfersize(3) poly8_t * ptr, poly8x8x3_t val, __constrange(0,7) int lane);// VST3.8 {d0[0], d1[0], d2[0]}, [r0] +void vst3_lane_p8_ptr(__transfersize(3) poly8_t * ptr, poly8x8x3_t * val, __constrange(0,7) int lane); +#define vst3_lane_p8 vst3_lane_u8 + +//void vst3_lane_p16(__transfersize(3) poly16_t * ptr, poly16x4x3_t val, __constrange(0,3) int lane);// VST3.16 {d0[0], d1[0], d2[0]}, [r0] +void vst3_lane_p16_ptr(__transfersize(3) poly16_t * ptr, poly16x4x3_t * val, __constrange(0,3) int lane); +#define vst3_lane_p16 vst3_lane_s16 + +//******************************** Quadruple lanes stores *********************************************** +//******************************************************************************************************* +//void vst4q_lane_u16(__transfersize(4) uint16_t * ptr, uint16x8x4_t val, __constrange(0,7) int lane)// VST4.16 {d0[0], d2[0], d4[0], d6[0]}, [r0] +_NEON2SSE_INLINE void vst4q_lane_u16_ptr(__transfersize(4) uint16_t * ptr, uint16x8x4_t* val4, __constrange(0,7) int lane) +{ + vst2q_lane_u16_ptr(ptr, (uint16x8x2_t*)val4->val, lane); + vst2q_lane_u16_ptr((ptr + 2),((uint16x8x2_t*)val4->val + 1), lane); +} +#define vst4q_lane_u16(ptr, val, lane) vst4q_lane_u16_ptr(ptr, &val, lane) + +//void vst4q_lane_u32(__transfersize(4) uint32_t * ptr, uint32x4x4_t val, __constrange(0,3) int lane)// VST4.32 {d0[0], d2[0], d4[0], d6[0]}, [r0] +_NEON2SSE_INLINE void vst4q_lane_u32_ptr(__transfersize(4) uint32_t * ptr, uint32x4x4_t* val4, __constrange(0,3) int lane) +{ + vst2q_lane_u32_ptr(ptr, (uint32x4x2_t*)val4->val, lane); + vst2q_lane_u32_ptr((ptr + 2), ((uint32x4x2_t*)val4->val + 1), lane); +} +#define vst4q_lane_u32(ptr, val, lane) vst4q_lane_u32_ptr(ptr, &val, lane) + +//void vst4q_lane_s16(__transfersize(4) int16_t * ptr, int16x8x4_t val, __constrange(0,7) int lane);// VST4.16 {d0[0], d2[0], d4[0], d6[0]}, [r0] +void vst4q_lane_s16_ptr(__transfersize(4) int16_t * ptr, int16x8x4_t * val, __constrange(0,7) int lane); +#define vst4q_lane_s16(ptr,val,lane) vst4q_lane_u16((uint16_t *)ptr,val,lane) + +//void vst4q_lane_s32(__transfersize(4) int32_t * ptr, int32x4x4_t val, __constrange(0,3) int lane);// VST4.32 {d0[0], d2[0], d4[0], d6[0]}, [r0] +void vst4q_lane_s32_ptr(__transfersize(4) int32_t * ptr, int32x4x4_t * val, __constrange(0,3) int lane); +#define vst4q_lane_s32(ptr,val,lane) vst4q_lane_u32((uint32_t *)ptr,val,lane) + +//void vst4q_lane_f16(__transfersize(4) __fp16 * ptr, float16x8x4_t val, __constrange(0,7) int lane);// VST4.16 {d0[0], d2[0], d4[0], d6[0]}, [r0] +void vst4q_lane_f16_ptr(__transfersize(4) __fp16 * ptr, float16x8x4_t * val, __constrange(0,7) int lane); +//current IA SIMD doesn't support float16 + +//void vst4q_lane_f32(__transfersize(4) float32_t * ptr, float32x4x4_t val, __constrange(0,3) int lane)// VST4.32 {d0[0], d2[0], d4[0], d6[0]}, [r0] +_NEON2SSE_INLINE void vst4q_lane_f32_ptr(__transfersize(4) float32_t * ptr, float32x4x4_t* val, __constrange(0,3) int lane) +{ + vst1q_lane_f32(ptr, val->val[0], lane); + vst1q_lane_f32((ptr + 1), val->val[1], lane); + vst1q_lane_f32((ptr + 2), val->val[2], lane); + vst1q_lane_f32((ptr + 3), val->val[3], lane); +} +#define vst4q_lane_f32(ptr,val,lane) vst4q_lane_f32_ptr(ptr,&val,lane) + +//void vst4q_lane_p16(__transfersize(4) poly16_t * ptr, poly16x8x4_t val, __constrange(0,7) int lane);// VST4.16 {d0[0], d2[0], d4[0], d6[0]}, [r0] +void vst4q_lane_p16_ptr(__transfersize(4) poly16_t * ptr, poly16x8x4_t * val, __constrange(0,7) int lane); +#define vst4q_lane_p16 vst4q_lane_u16 + +//void vst4_lane_u8(__transfersize(4) uint8_t * ptr, uint8x8x4_t val, __constrange(0,7) int lane)// VST4.8 {d0[0], d1[0], d2[0], d3[0]}, [r0] +_NEON2SSE_INLINE void vst4_lane_u8_ptr(__transfersize(4) uint8_t * ptr, uint8x8x4_t* val, __constrange(0,7) int lane) +{ + *(ptr) = val->val[0].m64_u8[lane]; + *(ptr + 1) = val->val[1].m64_u8[lane]; + *(ptr + 2) = val->val[2].m64_u8[lane]; + *(ptr + 3) = val->val[3].m64_u8[lane]; +} +#define vst4_lane_u8(ptr, val, lane) vst4_lane_u8_ptr(ptr, &val, lane) + +//void vst4_lane_u16(__transfersize(4) uint16_t * ptr, uint16x4x4_t val, __constrange(0,3) int lane)// VST4.16 {d0[0], d1[0], d2[0], d3[0]}, [r0] +_NEON2SSE_INLINE void vst4_lane_u16_ptr(__transfersize(4) uint16_t * ptr, uint16x4x4_t* val, __constrange(0,3) int lane) +{ + *(ptr) = val->val[0].m64_u16[lane]; + *(ptr + 1) = val->val[1].m64_u16[lane]; + *(ptr + 2) = val->val[2].m64_u16[lane]; + *(ptr + 3) = val->val[3].m64_u16[lane]; +} +#define vst4_lane_u16(ptr, val, lane) vst4_lane_u16_ptr(ptr, &val, lane) + +//void vst4_lane_u32(__transfersize(4) uint32_t * ptr, uint32x2x4_t val, __constrange(0,1) int lane)// VST4.32 {d0[0], d1[0], d2[0], d3[0]}, [r0] +_NEON2SSE_INLINE void vst4_lane_u32_ptr(__transfersize(4) uint32_t * ptr, uint32x2x4_t* val, __constrange(0,1) int lane) +{ + *(ptr) = val->val[0].m64_u32[lane]; + *(ptr + 1) = val->val[1].m64_u32[lane]; + *(ptr + 2) = val->val[2].m64_u32[lane]; + *(ptr + 3) = val->val[3].m64_u32[lane]; +} +#define vst4_lane_u32(ptr, val, lane) vst4_lane_u32_ptr(ptr, &val, lane) + +//void vst4_lane_s8(__transfersize(4) int8_t * ptr, int8x8x4_t val, __constrange(0,7) int lane)// VST4.8 {d0[0], d1[0], d2[0], d3[0]}, [r0] +#define vst4_lane_s8(ptr, val, lane) vst4_lane_u8((uint8_t*)ptr, val, lane) + +//void vst4_lane_s16(__transfersize(4) int16_t * ptr, int16x4x4_t val, __constrange(0,3) int lane)// VST4.16 {d0[0], d1[0], d2[0], d3[0]}, [r0] +#define vst4_lane_s16(ptr, val, lane) vst4_lane_u16((uint16_t*)ptr, val, lane) + +//void vst4_lane_s32(__transfersize(4) int32_t * ptr, int32x2x4_t val, __constrange(0,1) int lane)// VST4.32 {d0[0], d1[0], d2[0], d3[0]}, [r0] +#define vst4_lane_s32(ptr, val, lane) vst4_lane_u32((uint32_t*)ptr, val, lane) + +//void vst4_lane_f16(__transfersize(4) __fp16 * ptr, float16x4x4_t val, __constrange(0,3) int lane);// VST4.16 {d0[0], d1[0], d2[0], d3[0]}, [r0] +void vst4_lane_f16_ptr(__transfersize(4) __fp16 * ptr, float16x4x4_t * val, __constrange(0,3) int lane); +//current IA SIMD doesn't support float16 + +void vst4_lane_f32_ptr(__transfersize(4) float32_t * ptr, float32x2x4_t * val, __constrange(0,1) int lane); // VST4.32 {d0[0], d1[0], d2[0], d3[0]}, [r0] +_NEON2SSE_INLINE void vst4_lane_f32_ptr(__transfersize(4) float32_t * ptr, float32x2x4_t* val, __constrange(0,1) int lane) +{ + *(ptr) = val->val[0].m64_f32[lane]; + *(ptr + 1) = val->val[1].m64_f32[lane]; + *(ptr + 2) = val->val[2].m64_f32[lane]; + *(ptr + 3) = val->val[3].m64_f32[lane]; +} +#define vst4_lane_f32(ptr,val,lane) vst4_lane_f32_ptr(ptr,&val,lane) + +//void vst4_lane_p8(__transfersize(4) poly8_t * ptr, poly8x8x4_t val, __constrange(0,7) int lane);// VST4.8 {d0[0], d1[0], d2[0], d3[0]}, [r0] +void vst4_lane_p8_ptr(__transfersize(4) poly8_t * ptr, poly8x8x4_t * val, __constrange(0,7) int lane); +#define vst4_lane_p8 vst4_lane_u8 + +//void vst4_lane_p16(__transfersize(4) poly16_t * ptr, poly16x4x4_t val, __constrange(0,3) int lane);// VST4.16 {d0[0], d1[0], d2[0], d3[0]}, [r0] +void vst4_lane_p16_ptr(__transfersize(4) poly16_t * ptr, poly16x4x4_t * val, __constrange(0,3) int lane); +#define vst4_lane_p16 vst4_lane_u16 + +//************************************************************************************************** +//************************ Extract lanes from a vector ******************************************** +//************************************************************************************************** +//These intrinsics extract a single lane (element) from a vector. +uint8_t vget_lane_u8(uint8x8_t vec, __constrange(0,7) int lane); // VMOV.U8 r0, d0[0] +#define vget_lane_u8(vec, lane) vec.m64_u8[lane] + +uint16_t vget_lane_u16(uint16x4_t vec, __constrange(0,3) int lane); // VMOV.s16 r0, d0[0] +#define vget_lane_u16(vec, lane) vec.m64_u16[lane] + + +uint32_t vget_lane_u32(uint32x2_t vec, __constrange(0,1) int lane); // VMOV.32 r0, d0[0] +#define vget_lane_u32(vec, lane) vec.m64_u32[lane] + +int8_t vget_lane_s8(int8x8_t vec, __constrange(0,7) int lane); // VMOV.S8 r0, d0[0] +#define vget_lane_s8(vec, lane) vec.m64_i8[lane] + +int16_t vget_lane_s16(int16x4_t vec, __constrange(0,3) int lane); // VMOV.S16 r0, d0[0] +#define vget_lane_s16(vec, lane) vec.m64_i16[lane] + +int32_t vget_lane_s32(int32x2_t vec, __constrange(0,1) int lane); // VMOV.32 r0, d0[0] +#define vget_lane_s32(vec, lane) vec.m64_i32[lane] + +poly8_t vget_lane_p8(poly8x8_t vec, __constrange(0,7) int lane); // VMOV.U8 r0, d0[0] +#define vget_lane_p8 vget_lane_u8 + +poly16_t vget_lane_p16(poly16x4_t vec, __constrange(0,3) int lane); // VMOV.s16 r0, d0[0] +#define vget_lane_p16 vget_lane_u16 + +float32_t vget_lane_f32(float32x2_t vec, __constrange(0,1) int lane); // VMOV.32 r0, d0[0] +#define vget_lane_f32(vec, lane) vec.m64_f32[lane] + +uint8_t vgetq_lane_u8(uint8x16_t vec, __constrange(0,15) int lane); // VMOV.U8 r0, d0[0] +#define vgetq_lane_u8 _MM_EXTRACT_EPI8 + +uint16_t vgetq_lane_u16(uint16x8_t vec, __constrange(0,7) int lane); // VMOV.s16 r0, d0[0] +#define vgetq_lane_u16 _MM_EXTRACT_EPI16 + +uint32_t vgetq_lane_u32(uint32x4_t vec, __constrange(0,3) int lane); // VMOV.32 r0, d0[0] +#define vgetq_lane_u32 _MM_EXTRACT_EPI32 + +int8_t vgetq_lane_s8(int8x16_t vec, __constrange(0,15) int lane); // VMOV.S8 r0, d0[0] +#define vgetq_lane_s8 vgetq_lane_u8 + +int16_t vgetq_lane_s16(int16x8_t vec, __constrange(0,7) int lane); // VMOV.S16 r0, d0[0] +#define vgetq_lane_s16 vgetq_lane_u16 + +int32_t vgetq_lane_s32(int32x4_t vec, __constrange(0,3) int lane); // VMOV.32 r0, d0[0] +#define vgetq_lane_s32 vgetq_lane_u32 + +poly8_t vgetq_lane_p8(poly8x16_t vec, __constrange(0,15) int lane); // VMOV.U8 r0, d0[0] +#define vgetq_lane_p8 vgetq_lane_u8 + +poly16_t vgetq_lane_p16(poly16x8_t vec, __constrange(0,7) int lane); // VMOV.s16 r0, d0[0] +#define vgetq_lane_p16 vgetq_lane_u16 + +float32_t vgetq_lane_f32(float32x4_t vec, __constrange(0,3) int lane); // VMOV.32 r0, d0[0] +_NEON2SSE_INLINE float32_t vgetq_lane_f32(float32x4_t vec, __constrange(0,3) int lane) +{ + int32_t ilane; + ilane = _MM_EXTRACT_PS(vec,lane); + return *(float*)&ilane; +} + +int64_t vget_lane_s64(int64x1_t vec, __constrange(0,0) int lane); // VMOV r0,r0,d0 +#define vget_lane_s64(vec, lane) vec.m64_i64[0] + +uint64_t vget_lane_u64(uint64x1_t vec, __constrange(0,0) int lane); // VMOV r0,r0,d0 +#define vget_lane_u64(vec, lane) vec.m64_u64[0] + + +int64_t vgetq_lane_s64(int64x2_t vec, __constrange(0,1) int lane); // VMOV r0,r0,d0 +#define vgetq_lane_s64 (int64_t) vgetq_lane_u64 + +uint64_t vgetq_lane_u64(uint64x2_t vec, __constrange(0,1) int lane); // VMOV r0,r0,d0 +#define vgetq_lane_u64 _MM_EXTRACT_EPI64 + +// ***************** Set lanes within a vector ******************************************** +// ************************************************************************************** +//These intrinsics set a single lane (element) within a vector. +//same functions as vld1_lane_xx ones, but take the value to be set directly. + +uint8x8_t vset_lane_u8(uint8_t value, uint8x8_t vec, __constrange(0,7) int lane); // VMOV.8 d0[0],r0 +_NEON2SSE_INLINE uint8x8_t vset_lane_u8(uint8_t value, uint8x8_t vec, __constrange(0,7) int lane) +{ + uint8_t val; + val = value; + return vld1_lane_u8(&val, vec, lane); +} + +uint16x4_t vset_lane_u16(uint16_t value, uint16x4_t vec, __constrange(0,3) int lane); // VMOV.16 d0[0],r0 +_NEON2SSE_INLINE uint16x4_t vset_lane_u16(uint16_t value, uint16x4_t vec, __constrange(0,3) int lane) +{ + uint16_t val; + val = value; + return vld1_lane_u16(&val, vec, lane); +} + +uint32x2_t vset_lane_u32(uint32_t value, uint32x2_t vec, __constrange(0,1) int lane); // VMOV.32 d0[0],r0 +_NEON2SSE_INLINE uint32x2_t vset_lane_u32(uint32_t value, uint32x2_t vec, __constrange(0,1) int lane) +{ + uint32_t val; + val = value; + return vld1_lane_u32(&val, vec, lane); +} + +int8x8_t vset_lane_s8(int8_t value, int8x8_t vec, __constrange(0,7) int lane); // VMOV.8 d0[0],r0 +_NEON2SSE_INLINE int8x8_t vset_lane_s8(int8_t value, int8x8_t vec, __constrange(0,7) int lane) +{ + int8_t val; + val = value; + return vld1_lane_s8(&val, vec, lane); +} + +int16x4_t vset_lane_s16(int16_t value, int16x4_t vec, __constrange(0,3) int lane); // VMOV.16 d0[0],r0 +_NEON2SSE_INLINE int16x4_t vset_lane_s16(int16_t value, int16x4_t vec, __constrange(0,3) int lane) +{ + int16_t val; + val = value; + return vld1_lane_s16(&val, vec, lane); +} + +int32x2_t vset_lane_s32(int32_t value, int32x2_t vec, __constrange(0,1) int lane); // VMOV.32 d0[0],r0 +_NEON2SSE_INLINE int32x2_t vset_lane_s32(int32_t value, int32x2_t vec, __constrange(0,1) int lane) +{ + int32_t val; + val = value; + return vld1_lane_s32(&val, vec, lane); +} + +poly8x8_t vset_lane_p8(poly8_t value, poly8x8_t vec, __constrange(0,7) int lane); // VMOV.8 d0[0],r0 +#define vset_lane_p8 vset_lane_u8 + +poly16x4_t vset_lane_p16(poly16_t value, poly16x4_t vec, __constrange(0,3) int lane); // VMOV.16 d0[0],r0 +#define vset_lane_p16 vset_lane_u16 + +float32x2_t vset_lane_f32(float32_t value, float32x2_t vec, __constrange(0,1) int lane); // VMOV.32 d0[0],r0 +_NEON2SSE_INLINE float32x2_t vset_lane_f32(float32_t value, float32x2_t vec, __constrange(0,1) int lane) +{ + float32_t val; + val = value; + return vld1_lane_f32(&val, vec, lane); +} + +uint8x16_t vsetq_lane_u8(uint8_t value, uint8x16_t vec, __constrange(0,15) int lane); // VMOV.8 d0[0],r0 +_NEON2SSE_INLINE uint8x16_t vsetq_lane_u8(uint8_t value, uint8x16_t vec, __constrange(0,15) int lane) +{ + uint8_t val; + val = value; + return vld1q_lane_u8(&val, vec, lane); +} + +uint16x8_t vsetq_lane_u16(uint16_t value, uint16x8_t vec, __constrange(0,7) int lane); // VMOV.16 d0[0],r0 +_NEON2SSE_INLINE uint16x8_t vsetq_lane_u16(uint16_t value, uint16x8_t vec, __constrange(0,7) int lane) +{ + uint16_t val; + val = value; + return vld1q_lane_u16(&val, vec, lane); +} + +uint32x4_t vsetq_lane_u32(uint32_t value, uint32x4_t vec, __constrange(0,3) int lane); // VMOV.32 d0[0],r0 +_NEON2SSE_INLINE uint32x4_t vsetq_lane_u32(uint32_t value, uint32x4_t vec, __constrange(0,3) int lane) +{ + uint32_t val; + val = value; + return vld1q_lane_u32(&val, vec, lane); +} + +int8x16_t vsetq_lane_s8(int8_t value, int8x16_t vec, __constrange(0,15) int lane); // VMOV.8 d0[0],r0 +_NEON2SSE_INLINE int8x16_t vsetq_lane_s8(int8_t value, int8x16_t vec, __constrange(0,15) int lane) +{ + int8_t val; + val = value; + return vld1q_lane_s8(&val, vec, lane); +} + +int16x8_t vsetq_lane_s16(int16_t value, int16x8_t vec, __constrange(0,7) int lane); // VMOV.16 d0[0],r0 +_NEON2SSE_INLINE int16x8_t vsetq_lane_s16(int16_t value, int16x8_t vec, __constrange(0,7) int lane) +{ + int16_t val; + val = value; + return vld1q_lane_s16(&val, vec, lane); +} + +int32x4_t vsetq_lane_s32(int32_t value, int32x4_t vec, __constrange(0,3) int lane); // VMOV.32 d0[0],r0 +_NEON2SSE_INLINE int32x4_t vsetq_lane_s32(int32_t value, int32x4_t vec, __constrange(0,3) int lane) +{ + int32_t val; + val = value; + return vld1q_lane_s32(&val, vec, lane); +} + +poly8x16_t vsetq_lane_p8(poly8_t value, poly8x16_t vec, __constrange(0,15) int lane); // VMOV.8 d0[0],r0 +#define vsetq_lane_p8 vsetq_lane_u8 + +poly16x8_t vsetq_lane_p16(poly16_t value, poly16x8_t vec, __constrange(0,7) int lane); // VMOV.16 d0[0],r0 +#define vsetq_lane_p16 vsetq_lane_u16 + +float32x4_t vsetq_lane_f32(float32_t value, float32x4_t vec, __constrange(0,3) int lane); // VMOV.32 d0[0],r0 +_NEON2SSE_INLINE float32x4_t vsetq_lane_f32(float32_t value, float32x4_t vec, __constrange(0,3) int lane) +{ + float32_t val; + val = value; + return vld1q_lane_f32(&val, vec, lane); +} + +int64x1_t vset_lane_s64(int64_t value, int64x1_t vec, __constrange(0,0) int lane); // VMOV d0,r0,r0 +_NEON2SSE_INLINE int64x1_t vset_lane_s64(int64_t value, int64x1_t vec, __constrange(0,0) int lane) +{ + int64_t val; + val = value; + return vld1_lane_s64(&val, vec, lane); +} + +uint64x1_t vset_lane_u64(uint64_t value, uint64x1_t vec, __constrange(0,0) int lane); // VMOV d0,r0,r0 +_NEON2SSE_INLINE uint64x1_t vset_lane_u64(uint64_t value, uint64x1_t vec, __constrange(0,0) int lane) +{ + uint64_t val; + val = value; + return vld1_lane_u64(&val, vec, lane); +} + +int64x2_t vsetq_lane_s64(int64_t value, int64x2_t vec, __constrange(0,1) int lane); // VMOV d0,r0,r0 +_NEON2SSE_INLINE int64x2_t vsetq_lane_s64(int64_t value, int64x2_t vec, __constrange(0,1) int lane) +{ + uint64_t val; + val = value; + return vld1q_lane_s64(&val, vec, lane); +} + +uint64x2_t vsetq_lane_u64(uint64_t value, uint64x2_t vec, __constrange(0,1) int lane); // VMOV d0,r0,r0 +#define vsetq_lane_u64 vsetq_lane_s64 + +// ******************************************************************************* +// **************** Initialize a vector from bit pattern *************************** +// ******************************************************************************* +//These intrinsics create a vector from a literal bit pattern. +int8x8_t vcreate_s8(uint64_t a); // VMOV d0,r0,r0 +#define vcreate_s8(a) (*(__m64_128*)&(a)) + + +int16x4_t vcreate_s16(uint64_t a); // VMOV d0,r0,r0 +#define vcreate_s16 vcreate_s8 + +int32x2_t vcreate_s32(uint64_t a); // VMOV d0,r0,r0 +#define vcreate_s32 vcreate_s8 + +float16x4_t vcreate_f16(uint64_t a); // VMOV d0,r0,r0 +//no IA32 SIMD avalilable + +float32x2_t vcreate_f32(uint64_t a); // VMOV d0,r0,r0 +#define vcreate_f32(a) (*(__m64_128*)&(a)) + +uint8x8_t vcreate_u8(uint64_t a); // VMOV d0,r0,r0 +#define vcreate_u8 vcreate_s8 + +uint16x4_t vcreate_u16(uint64_t a); // VMOV d0,r0,r0 +#define vcreate_u16 vcreate_s16 + +uint32x2_t vcreate_u32(uint64_t a); // VMOV d0,r0,r0 +#define vcreate_u32 vcreate_s32 + +uint64x1_t vcreate_u64(uint64_t a); // VMOV d0,r0,r0 +#define vcreate_u64 vcreate_s8 + + +poly8x8_t vcreate_p8(uint64_t a); // VMOV d0,r0,r0 +#define vcreate_p8 vcreate_u8 + +poly16x4_t vcreate_p16(uint64_t a); // VMOV d0,r0,r0 +#define vcreate_p16 vcreate_u16 + +int64x1_t vcreate_s64(uint64_t a); // VMOV d0,r0,r0 +#define vcreate_s64 vcreate_u64 + +//********************* Set all lanes to same value ******************************** +//********************************************************************************* +//These intrinsics set all lanes to the same value. +uint8x8_t vdup_n_u8(uint8_t value); // VDUP.8 d0,r0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint8x8_t vdup_n_u8(uint8_t value), _NEON2SSE_REASON_SLOW_SERIAL) +{ + uint8x8_t res; + int i; + for (i = 0; i<8; i++) { + res.m64_u8[i] = value; + } + return res; +} + +uint16x4_t vdup_n_u16(uint16_t value); // VDUP.16 d0,r0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint16x4_t vdup_n_u16(uint16_t value), _NEON2SSE_REASON_SLOW_SERIAL) +{ + uint16x4_t res; + int i; + for (i = 0; i<4; i++) { + res.m64_u16[i] = value; + } + return res; +} + +uint32x2_t vdup_n_u32(uint32_t value); // VDUP.32 d0,r0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(uint32x2_t vdup_n_u32(uint32_t value), _NEON2SSE_REASON_SLOW_SERIAL) +{ + uint32x2_t res; + res.m64_u32[0] = value; + res.m64_u32[1] = value; + return res; +} + +int8x8_t vdup_n_s8(int8_t value); // VDUP.8 d0,r0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int8x8_t vdup_n_s8(int8_t value), _NEON2SSE_REASON_SLOW_SERIAL) +{ + int8x8_t res; + int i; + for (i = 0; i<8; i++) { + res.m64_i8[i] = value; + } + return res; +} + +int16x4_t vdup_n_s16(int16_t value); // VDUP.16 d0,r0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int16x4_t vdup_n_s16(int16_t value), _NEON2SSE_REASON_SLOW_SERIAL) +{ + int16x4_t res; + int i; + for (i = 0; i<4; i++) { + res.m64_i16[i] = value; + } + return res; +} + +int32x2_t vdup_n_s32(int32_t value); // VDUP.32 d0,r0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x2_t vdup_n_s32(int32_t value), _NEON2SSE_REASON_SLOW_SERIAL) +{ + int32x2_t res; + res.m64_i32[0] = value; + res.m64_i32[1] = value; + return res; +} + +poly8x8_t vdup_n_p8(poly8_t value); // VDUP.8 d0,r0 +#define vdup_n_p8 vdup_n_u8 + +poly16x4_t vdup_n_p16(poly16_t value); // VDUP.16 d0,r0 +#define vdup_n_p16 vdup_n_s16 + +float32x2_t vdup_n_f32(float32_t value); // VDUP.32 d0,r0 +_NEON2SSE_INLINE float32x2_t vdup_n_f32(float32_t value) +{ + float32x2_t res; + res.m64_f32[0] = value; + res.m64_f32[1] = value; + return res; +} + +uint8x16_t vdupq_n_u8(uint8_t value); // VDUP.8 q0,r0 +#define vdupq_n_u8(value) _mm_set1_epi8((uint8_t) (value)) + +uint16x8_t vdupq_n_u16(uint16_t value); // VDUP.16 q0,r0 +#define vdupq_n_u16(value) _mm_set1_epi16((uint16_t) (value)) + +uint32x4_t vdupq_n_u32(uint32_t value); // VDUP.32 q0,r0 +#define vdupq_n_u32(value) _mm_set1_epi32((uint32_t) (value)) + +int8x16_t vdupq_n_s8(int8_t value); // VDUP.8 q0,r0 +#define vdupq_n_s8 _mm_set1_epi8 + +int16x8_t vdupq_n_s16(int16_t value); // VDUP.16 q0,r0 +#define vdupq_n_s16 _mm_set1_epi16 + +int32x4_t vdupq_n_s32(int32_t value); // VDUP.32 q0,r0 +#define vdupq_n_s32 _mm_set1_epi32 + +poly8x16_t vdupq_n_p8(poly8_t value); // VDUP.8 q0,r0 +#define vdupq_n_p8 vdupq_n_u8 + +poly16x8_t vdupq_n_p16(poly16_t value); // VDUP.16 q0,r0 +#define vdupq_n_p16 vdupq_n_u16 + +float32x4_t vdupq_n_f32(float32_t value); // VDUP.32 q0,r0 +#define vdupq_n_f32 _mm_set1_ps + +int64x1_t vdup_n_s64(int64_t value); // VMOV d0,r0,r0 +_NEON2SSE_INLINE int64x1_t vdup_n_s64(int64_t value) +{ + int64x1_t res; + res.m64_i64[0] = value; + return res; +} + +uint64x1_t vdup_n_u64(uint64_t value); // VMOV d0,r0,r0 +_NEON2SSE_INLINE uint64x1_t vdup_n_u64(uint64_t value) +{ + uint64x1_t res; + res.m64_u64[0] = value; + return res; +} + +int64x2_t vdupq_n_s64(int64_t value); // VMOV d0,r0,r0 +_NEON2SSE_INLINE int64x2_t vdupq_n_s64(int64_t value) +{ + _NEON2SSE_ALIGN_16 int64_t value2[2] = {value, value}; //value may be an immediate + return LOAD_SI128(value2); +} + +uint64x2_t vdupq_n_u64(uint64_t value); // VMOV d0,r0,r0 +_NEON2SSE_INLINE uint64x2_t vdupq_n_u64(uint64_t value) +{ + _NEON2SSE_ALIGN_16 uint64_t val[2] = {value, value}; //value may be an immediate + return LOAD_SI128(val); +} + +//**** Set all lanes to same value ************************ +//Same functions as above - just aliaces.******************** +//Probably they reflect the fact that 128-bit functions versions use VMOV instruction ********** +uint8x8_t vmov_n_u8(uint8_t value); // VDUP.8 d0,r0 +#define vmov_n_u8 vdup_n_s8 + +uint16x4_t vmov_n_u16(uint16_t value); // VDUP.16 d0,r0 +#define vmov_n_u16 vdup_n_s16 + +uint32x2_t vmov_n_u32(uint32_t value); // VDUP.32 d0,r0 +#define vmov_n_u32 vdup_n_u32 + +int8x8_t vmov_n_s8(int8_t value); // VDUP.8 d0,r0 +#define vmov_n_s8 vdup_n_s8 + +int16x4_t vmov_n_s16(int16_t value); // VDUP.16 d0,r0 +#define vmov_n_s16 vdup_n_s16 + +int32x2_t vmov_n_s32(int32_t value); // VDUP.32 d0,r0 +#define vmov_n_s32 vdup_n_s32 + +poly8x8_t vmov_n_p8(poly8_t value); // VDUP.8 d0,r0 +#define vmov_n_p8 vdup_n_u8 + +poly16x4_t vmov_n_p16(poly16_t value); // VDUP.16 d0,r0 +#define vmov_n_p16 vdup_n_s16 + +float32x2_t vmov_n_f32(float32_t value); // VDUP.32 d0,r0 +#define vmov_n_f32 vdup_n_f32 + +uint8x16_t vmovq_n_u8(uint8_t value); // VDUP.8 q0,r0 +#define vmovq_n_u8 vdupq_n_u8 + +uint16x8_t vmovq_n_u16(uint16_t value); // VDUP.16 q0,r0 +#define vmovq_n_u16 vdupq_n_s16 + +uint32x4_t vmovq_n_u32(uint32_t value); // VDUP.32 q0,r0 +#define vmovq_n_u32 vdupq_n_u32 + +int8x16_t vmovq_n_s8(int8_t value); // VDUP.8 q0,r0 +#define vmovq_n_s8 vdupq_n_s8 + +int16x8_t vmovq_n_s16(int16_t value); // VDUP.16 q0,r0 +#define vmovq_n_s16 vdupq_n_s16 + +int32x4_t vmovq_n_s32(int32_t value); // VDUP.32 q0,r0 +#define vmovq_n_s32 vdupq_n_s32 + +poly8x16_t vmovq_n_p8(poly8_t value); // VDUP.8 q0,r0 +#define vmovq_n_p8 vdupq_n_u8 + +poly16x8_t vmovq_n_p16(poly16_t value); // VDUP.16 q0,r0 +#define vmovq_n_p16 vdupq_n_s16 + +float32x4_t vmovq_n_f32(float32_t value); // VDUP.32 q0,r0 +#define vmovq_n_f32 vdupq_n_f32 + +int64x1_t vmov_n_s64(int64_t value); // VMOV d0,r0,r0 +#define vmov_n_s64 vdup_n_s64 + +uint64x1_t vmov_n_u64(uint64_t value); // VMOV d0,r0,r0 +#define vmov_n_u64 vdup_n_u64 + +int64x2_t vmovq_n_s64(int64_t value); // VMOV d0,r0,r0 +#define vmovq_n_s64 vdupq_n_s64 + +uint64x2_t vmovq_n_u64(uint64_t value); // VMOV d0,r0,r0 +#define vmovq_n_u64 vdupq_n_u64 + +//**************Set all lanes to the value of one lane of a vector ************* +//**************************************************************************** +//here shuffle is better solution than lane extraction followed by set1 function +uint8x8_t vdup_lane_u8(uint8x8_t vec, __constrange(0,7) int lane); // VDUP.8 d0,d0[0] +_NEON2SSE_INLINE uint8x8_t vdup_lane_u8(uint8x8_t vec, __constrange(0,7) int lane) +{ + uint8x8_t res; + uint8_t valane; + int i = 0; + valane = vec.m64_u8[lane]; + for (i = 0; i<8; i++) { + res.m64_u8[i] = valane; + } + return res; +} + +uint16x4_t vdup_lane_u16(uint16x4_t vec, __constrange(0,3) int lane); // VDUP.16 d0,d0[0] +_NEON2SSE_INLINE uint16x4_t vdup_lane_u16(uint16x4_t vec, __constrange(0,3) int lane) +{ + uint16x4_t res; + uint16_t valane; + valane = vec.m64_u16[lane]; + res.m64_u16[0] = valane; + res.m64_u16[1] = valane; + res.m64_u16[2] = valane; + res.m64_u16[3] = valane; + return res; +} + +uint32x2_t vdup_lane_u32(uint32x2_t vec, __constrange(0,1) int lane); // VDUP.32 d0,d0[0] +_NEON2SSE_INLINE uint32x2_t vdup_lane_u32(uint32x2_t vec, __constrange(0,1) int lane) +{ + uint32x2_t res; + res.m64_u32[0] = vec.m64_u32[lane]; + res.m64_u32[1] = res.m64_u32[0]; + return res; +} + +int8x8_t vdup_lane_s8(int8x8_t vec, __constrange(0,7) int lane); // VDUP.8 d0,d0[0] +#define vdup_lane_s8 vdup_lane_u8 + +int16x4_t vdup_lane_s16(int16x4_t vec, __constrange(0,3) int lane); // VDUP.16 d0,d0[0] +#define vdup_lane_s16 vdup_lane_u16 + +int32x2_t vdup_lane_s32(int32x2_t vec, __constrange(0,1) int lane); // VDUP.32 d0,d0[0] +#define vdup_lane_s32 vdup_lane_u32 + +poly8x8_t vdup_lane_p8(poly8x8_t vec, __constrange(0,7) int lane); // VDUP.8 d0,d0[0] +#define vdup_lane_p8 vdup_lane_u8 + +poly16x4_t vdup_lane_p16(poly16x4_t vec, __constrange(0,3) int lane); // VDUP.16 d0,d0[0] +#define vdup_lane_p16 vdup_lane_s16 + +float32x2_t vdup_lane_f32(float32x2_t vec, __constrange(0,1) int lane); // VDUP.32 d0,d0[0] +_NEON2SSE_INLINE float32x2_t vdup_lane_f32(float32x2_t vec, __constrange(0,1) int lane) +{ + float32x2_t res; + res.m64_f32[0] = vec.m64_f32[lane]; + res.m64_f32[1] = res.m64_f32[0]; + return res; +} + +uint8x16_t vdupq_lane_u8(uint8x8_t vec, __constrange(0,7) int lane); // VDUP.8 q0,d0[0] +_NEON2SSE_INLINE uint8x16_t vdupq_lane_u8(uint8x8_t vec, __constrange(0,7) int lane) // VDUP.8 q0,d0[0] +{ + _NEON2SSE_ALIGN_16 int8_t lanemask8[16] = {lane, lane, lane, lane, lane, lane, lane, lane, lane, lane, lane, lane, lane, lane, lane, lane}; + return _mm_shuffle_epi8 (_pM128i(vec), *(__m128i*) lanemask8); +} + +uint16x8_t vdupq_lane_u16(uint16x4_t vec, __constrange(0,3) int lane); // VDUP.16 q0,d0[0] +_NEON2SSE_INLINE uint16x8_t vdupq_lane_u16(uint16x4_t vec, __constrange(0,3) int lane) // VDUP.16 q0,d0[0] +{ + //we could use 8bit shuffle for 16 bit as well + const int8_t lane16 = ((int8_t) lane) << 1; + _NEON2SSE_ALIGN_16 int8_t lanemask_e16[16] = {lane16, lane16 + 1, lane16, lane16 + 1, lane16, lane16 + 1, lane16, lane16 + 1, + lane16, lane16 + 1, lane16, lane16 + 1, lane16, lane16 + 1, lane16, lane16 + 1}; + return _mm_shuffle_epi8 (_pM128i(vec), *(__m128i*)lanemask_e16); +} + +uint32x4_t vdupq_lane_u32(uint32x2_t vec, __constrange(0,1) int lane); // VDUP.32 q0,d0[0] +#define vdupq_lane_u32(vec, lane) _mm_shuffle_epi32 (_pM128i(vec), lane | (lane << 2) | (lane << 4) | (lane << 6)) + +int8x16_t vdupq_lane_s8(int8x8_t vec, __constrange(0,7) int lane); // VDUP.8 q0,d0[0] +#define vdupq_lane_s8 vdupq_lane_u8 + +int16x8_t vdupq_lane_s16(int16x4_t vec, __constrange(0,3) int lane); // VDUP.16 q0,d0[0] +#define vdupq_lane_s16 vdupq_lane_u16 + +int32x4_t vdupq_lane_s32(int32x2_t vec, __constrange(0,1) int lane); // VDUP.32 q0,d0[0] +#define vdupq_lane_s32 vdupq_lane_u32 + +poly8x16_t vdupq_lane_p8(poly8x8_t vec, __constrange(0,7) int lane); // VDUP.8 q0,d0[0] +#define vdupq_lane_p8 vdupq_lane_u8 + +poly16x8_t vdupq_lane_p16(poly16x4_t vec, __constrange(0,3) int lane); // VDUP.16 q0,d0[0] +#define vdupq_lane_p16 vdupq_lane_s16 + +float32x4_t vdupq_lane_f32(float32x2_t vec, __constrange(0,1) int lane); // VDUP.32 q0,d0[0] +#define vdupq_lane_f32(vec, lane) _mm_load1_ps((vec.m64_f32 + lane)) + +int64x1_t vdup_lane_s64(int64x1_t vec, __constrange(0,0) int lane); // VMOV d0,d0 +#define vdup_lane_s64(vec,lane) vec + +uint64x1_t vdup_lane_u64(uint64x1_t vec, __constrange(0,0) int lane); // VMOV d0,d0 +#define vdup_lane_u64(vec,lane) vec + +int64x2_t vdupq_lane_s64(int64x1_t vec, __constrange(0,0) int lane); // VMOV q0,q0 +_NEON2SSE_INLINE int64x2_t vdupq_lane_s64(int64x1_t vec, __constrange(0,0) int lane) +{ + __m128i vec128; + vec128 = _pM128i(vec); + return _mm_unpacklo_epi64(vec128,vec128); +} + +uint64x2_t vdupq_lane_u64(uint64x1_t vec, __constrange(0,0) int lane); // VMOV q0,q0 +#define vdupq_lane_u64 vdupq_lane_s64 + +// ******************************************************************** +// ******************** Combining vectors ***************************** +// ******************************************************************** +//These intrinsics join two 64 bit vectors into a single 128bit vector. +int8x16_t vcombine_s8(int8x8_t low, int8x8_t high); // VMOV d0,d0 +#define vcombine_s8(low, high) _mm_unpacklo_epi64 (_pM128i(low), _pM128i(high) ) + +int16x8_t vcombine_s16(int16x4_t low, int16x4_t high); // VMOV d0,d0 +#define vcombine_s16(low, high) _mm_unpacklo_epi64 (_pM128i(low), _pM128i(high) ) + +int32x4_t vcombine_s32(int32x2_t low, int32x2_t high); // VMOV d0,d0 +#define vcombine_s32(low, high) _mm_unpacklo_epi64 (_pM128i(low), _pM128i(high) ) + +int64x2_t vcombine_s64(int64x1_t low, int64x1_t high); // VMOV d0,d0 +#define vcombine_s64(low, high) _mm_unpacklo_epi64 (_pM128i(low), _pM128i(high) ) + +float16x8_t vcombine_f16(float16x4_t low, float16x4_t high); // VMOV d0,d0 +//current IA SIMD doesn't support float16 + +float32x4_t vcombine_f32(float32x2_t low, float32x2_t high); // VMOV d0,d0 +_NEON2SSE_INLINE float32x4_t vcombine_f32(float32x2_t low, float32x2_t high) +{ + __m128i res; + res = _mm_unpacklo_epi64(_pM128i(low), _pM128i(high) ); + return _M128(res); +} + +uint8x16_t vcombine_u8(uint8x8_t low, uint8x8_t high); // VMOV d0,d0 +#define vcombine_u8 vcombine_s8 + +uint16x8_t vcombine_u16(uint16x4_t low, uint16x4_t high); // VMOV d0,d0 +#define vcombine_u16 vcombine_s16 + +uint32x4_t vcombine_u32(uint32x2_t low, uint32x2_t high); // VMOV d0,d0 +#define vcombine_u32 vcombine_s32 + +uint64x2_t vcombine_u64(uint64x1_t low, uint64x1_t high); // VMOV d0,d0 +#define vcombine_u64 vcombine_s64 + +poly8x16_t vcombine_p8(poly8x8_t low, poly8x8_t high); // VMOV d0,d0 +#define vcombine_p8 vcombine_u8 + +poly16x8_t vcombine_p16(poly16x4_t low, poly16x4_t high); // VMOV d0,d0 +#define vcombine_p16 vcombine_u16 + +//********************************************************************** +//************************* Splitting vectors ************************** +//********************************************************************** +//**************** Get high part ****************************************** +//These intrinsics split a 128 bit vector into 2 component 64 bit vectors +int8x8_t vget_high_s8(int8x16_t a); // VMOV d0,d0 +_NEON2SSE_INLINE int8x8_t vget_high_s8(int8x16_t a) +{ + int8x8_t res64; + __m128i res; + res = _mm_unpackhi_epi64(a,a); //SSE2 + return64(res); +} + +int16x4_t vget_high_s16(int16x8_t a); // VMOV d0,d0 +_NEON2SSE_INLINE int16x4_t vget_high_s16(int16x8_t a) +{ + int16x4_t res64; + __m128i res; + res = _mm_unpackhi_epi64(a,a); //SSE2 + return64(res); +} + +int32x2_t vget_high_s32(int32x4_t a); // VMOV d0,d0 +_NEON2SSE_INLINE int32x2_t vget_high_s32(int32x4_t a) +{ + int32x2_t res64; + __m128i res; + res = _mm_unpackhi_epi64(a,a); //SSE2 + return64(res); +} + +int64x1_t vget_high_s64(int64x2_t a); // VMOV d0,d0 +_NEON2SSE_INLINE int64x1_t vget_high_s64(int64x2_t a) +{ + int64x1_t res64; + __m128i res; + res = _mm_unpackhi_epi64(a,a); //SSE2 + return64(res); +} + +float16x4_t vget_high_f16(float16x8_t a); // VMOV d0,d0 +// IA32 SIMD doesn't work with 16bit floats currently + +float32x2_t vget_high_f32(float32x4_t a); // VMOV d0,d0 +_NEON2SSE_INLINE float32x2_t vget_high_f32(float32x4_t a) +{ + __m128i res; + __m64_128 res64; + res = _mm_unpackhi_epi64(_M128i(a),_M128i(a)); + return64(res); +} + +uint8x8_t vget_high_u8(uint8x16_t a); // VMOV d0,d0 +#define vget_high_u8 vget_high_s8 + +uint16x4_t vget_high_u16(uint16x8_t a); // VMOV d0,d0 +#define vget_high_u16 vget_high_s16 + +uint32x2_t vget_high_u32(uint32x4_t a); // VMOV d0,d0 +#define vget_high_u32 vget_high_s32 + +uint64x1_t vget_high_u64(uint64x2_t a); // VMOV d0,d0 +#define vget_high_u64 vget_high_s64 + +poly8x8_t vget_high_p8(poly8x16_t a); // VMOV d0,d0 +#define vget_high_p8 vget_high_u8 + +poly16x4_t vget_high_p16(poly16x8_t a); // VMOV d0,d0 +#define vget_high_p16 vget_high_u16 + +//********************** Get low part ********************** +//********************************************************** +int8x8_t vget_low_s8(int8x16_t a); // VMOV d0,d0 +_NEON2SSE_INLINE int8x8_t vget_low_s8(int8x16_t a) // VMOV d0,d0 +{ + int16x4_t res64; + return64(a); +} + +int16x4_t vget_low_s16(int16x8_t a); // VMOV d0,d0 +_NEON2SSE_INLINE int16x4_t vget_low_s16(int16x8_t a) // VMOV d0,d0 +{ + int16x4_t res64; + return64(a); +} + +int32x2_t vget_low_s32(int32x4_t a); // VMOV d0,d0 +_NEON2SSE_INLINE int32x2_t vget_low_s32(int32x4_t a) // VMOV d0,d0 +{ + int32x2_t res64; + return64(a); +} + +int64x1_t vget_low_s64(int64x2_t a); // VMOV d0,d0 +_NEON2SSE_INLINE int64x1_t vget_low_s64(int64x2_t a) // VMOV d0,d0 +{ + int64x1_t res64; + return64 (a); +} + +float16x4_t vget_low_f16(float16x8_t a); // VMOV d0,d0 +// IA32 SIMD doesn't work with 16bit floats currently + +float32x2_t vget_low_f32(float32x4_t a); // VMOV d0,d0 +_NEON2SSE_INLINE float32x2_t vget_low_f32(float32x4_t a) +{ + float32x2_t res64; + _M64f(res64, a); + return res64; +} + +uint8x8_t vget_low_u8(uint8x16_t a); // VMOV d0,d0 +#define vget_low_u8 vget_low_s8 + +uint16x4_t vget_low_u16(uint16x8_t a); // VMOV d0,d0 +#define vget_low_u16 vget_low_s16 + +uint32x2_t vget_low_u32(uint32x4_t a); // VMOV d0,d0 +#define vget_low_u32 vget_low_s32 + +uint64x1_t vget_low_u64(uint64x2_t a); // VMOV d0,d0 +#define vget_low_u64 vget_low_s64 + +poly8x8_t vget_low_p8(poly8x16_t a); // VMOV d0,d0 +#define vget_low_p8 vget_low_u8 + +poly16x4_t vget_low_p16(poly16x8_t a); // VMOV d0,d0 +#define vget_low_p16 vget_low_s16 + +//************************************************************************** +//************************ Converting vectors ********************************** +//************************************************************************** +//************* Convert from float *************************************** +// need to set _MM_SET_ROUNDING_MODE ( x) accordingly +int32x2_t vcvt_s32_f32(float32x2_t a); // VCVT.S32.F32 d0, d0 +_NEON2SSE_INLINE int32x2_t vcvt_s32_f32(float32x2_t a) +{ + int32x2_t res64; + __m128i res; + res = _mm_cvttps_epi32(_pM128(a)); //use low 64 bits of result only + return64(res); +} + +uint32x2_t vcvt_u32_f32(float32x2_t a); // VCVT.U32.F32 d0, d0 +_NEON2SSE_INLINE uint32x2_t vcvt_u32_f32(float32x2_t a) +{ + //may be not effective compared with a serial SIMD solution + uint32x2_t res64; + __m128i res; + res = vcvtq_u32_f32(_pM128(a)); + return64(res); +} + +int32x4_t vcvtq_s32_f32(float32x4_t a); // VCVT.S32.F32 q0, q0 +#define vcvtq_s32_f32 _mm_cvttps_epi32 + +uint32x4_t vcvtq_u32_f32(float32x4_t a); // VCVT.U32.F32 q0, q0 +_NEON2SSE_INLINE uint32x4_t vcvtq_u32_f32(float32x4_t a) // VCVT.U32.F32 q0, q0 +{ + //No single instruction SSE solution but we could implement it as following: + __m128i resi; + __m128 zero, mask, a_pos, mask_f_max_si, res; + _NEON2SSE_ALIGN_16 int32_t c7fffffff[4] = {0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff}; + zero = _mm_setzero_ps(); + mask = _mm_cmpgt_ps(a, zero); + a_pos = _mm_and_ps(a, mask); + mask_f_max_si = _mm_cmpgt_ps(a_pos,*(__m128*)c7fffffff); + res = _mm_sub_ps(a_pos, mask_f_max_si); //if the input fits to signed we don't subtract anything + resi = _mm_cvttps_epi32(res); + return _mm_add_epi32(resi, *(__m128i*)&mask_f_max_si); +} + +// ***** Convert to the fixed point with the number of fraction bits specified by b *********** +//************************************************************************************************* +int32x2_t vcvt_n_s32_f32(float32x2_t a, __constrange(1,32) int b); // VCVT.S32.F32 d0, d0, #32 +_NEON2SSE_INLINE int32x2_t vcvt_n_s32_f32(float32x2_t a, __constrange(1,32) int b) +{ + int32x2_t res64; + return64(vcvtq_n_s32_f32(_pM128(a),b)); +} + +uint32x2_t vcvt_n_u32_f32(float32x2_t a, __constrange(1,32) int b); // VCVT.U32.F32 d0, d0, #32 +_NEON2SSE_INLINE uint32x2_t vcvt_n_u32_f32(float32x2_t a, __constrange(1,32) int b) +{ + uint32x2_t res; + float convconst; + convconst = (float)((uint32_t)1 << b); + res.m64_u32[0] = (uint32_t) (a.m64_f32[0] * convconst); + res.m64_u32[1] = (uint32_t) (a.m64_f32[1] * convconst); + return res; +} + +int32x4_t vcvtq_n_s32_f32(float32x4_t a, __constrange(1,32) int b); // VCVT.S32.F32 q0, q0, #32 +_NEON2SSE_INLINE int32x4_t vcvtq_n_s32_f32(float32x4_t a, __constrange(1,32) int b) +{ + float convconst; + _NEON2SSE_ALIGN_16 uint32_t cmask[] = {0x80000000, 0x80000000, 0x80000000, 0x80000000}; + __m128 cconst128; + __m128i mask, res; + convconst = (float)(1 << b); + cconst128 = vdupq_n_f32(convconst); + res = _mm_cvttps_epi32(_mm_mul_ps(a,cconst128)); + mask = _mm_cmpeq_epi32 (res, *(__m128i*)cmask); + return _mm_xor_si128 (res, mask); //res saturated for 0x80000000 +} + +uint32x4_t vcvtq_n_u32_f32(float32x4_t a, __constrange(1,32) int b); // VCVT.U32.F32 q0, q0, #32 +_NEON2SSE_INLINE uint32x4_t vcvtq_n_u32_f32(float32x4_t a, __constrange(1,32) int b) +{ + float convconst; + __m128 cconst128; + convconst = (float)(1 << b); + cconst128 = vdupq_n_f32(convconst); + return vcvtq_u32_f32(_mm_mul_ps(a,cconst128)); +} + +//***************** Convert to float ************************* +//************************************************************* +float32x2_t vcvt_f32_s32(int32x2_t a); // VCVT.F32.S32 d0, d0 +_NEON2SSE_INLINE float32x2_t vcvt_f32_s32(int32x2_t a) //use low 64 bits +{ + float32x2_t res; + res.m64_f32[0] = (float) a.m64_i32[0]; + res.m64_f32[1] = (float) a.m64_i32[1]; + return res; +} + +float32x2_t vcvt_f32_u32(uint32x2_t a); // VCVT.F32.U32 d0, d0 +_NEON2SSE_INLINE float32x2_t vcvt_f32_u32(uint32x2_t a) +{ + float32x2_t res; + res.m64_f32[0] = (float) a.m64_u32[0]; + res.m64_f32[1] = (float) a.m64_u32[1]; + return res; +} + +float32x4_t vcvtq_f32_s32(int32x4_t a); // VCVT.F32.S32 q0, q0 +#define vcvtq_f32_s32(a) _mm_cvtepi32_ps(a) + +float32x4_t vcvtq_f32_u32(uint32x4_t a); // VCVT.F32.U32 q0, q0 +_NEON2SSE_INLINE float32x4_t vcvtq_f32_u32(uint32x4_t a) // VCVT.F32.U32 q0, q0 +{ + //solution may be not optimal + __m128 two16, fHi, fLo; + __m128i hi, lo; + two16 = _mm_set1_ps((float)0x10000); //2^16 + // Avoid double rounding by doing two exact conversions + // of high and low 16-bit segments + hi = _mm_srli_epi32(a, 16); + lo = _mm_srli_epi32(_mm_slli_epi32(a, 16), 16); + fHi = _mm_mul_ps(_mm_cvtepi32_ps(hi), two16); + fLo = _mm_cvtepi32_ps(lo); + // do single rounding according to current rounding mode + return _mm_add_ps(fHi, fLo); +} + +// ***** Convert to the float from fixed point with the number of fraction bits specified by b *********** +float32x2_t vcvt_n_f32_s32(int32x2_t a, __constrange(1,32) int b); // VCVT.F32.S32 d0, d0, #32 +_NEON2SSE_INLINE float32x2_t vcvt_n_f32_s32(int32x2_t a, __constrange(1,32) int b) +{ + float32x2_t res; + float convconst; + convconst = (float)(1. / ((uint32_t)1 << b)); + res.m64_f32[0] = a.m64_i32[0] * convconst; + res.m64_f32[1] = a.m64_i32[1] * convconst; + return res; +} + +float32x2_t vcvt_n_f32_u32(uint32x2_t a, __constrange(1,32) int b); // VCVT.F32.U32 d0, d0, #32 +_NEON2SSE_INLINE float32x2_t vcvt_n_f32_u32(uint32x2_t a, __constrange(1,32) int b) // VCVT.F32.U32 d0, d0, #32 +{ + float32x2_t res; + float convconst; + convconst = (float)(1. / ((uint32_t)1 << b)); + res.m64_f32[0] = a.m64_u32[0] * convconst; + res.m64_f32[1] = a.m64_u32[1] * convconst; + return res; +} + +float32x4_t vcvtq_n_f32_s32(int32x4_t a, __constrange(1,32) int b); // VCVT.F32.S32 q0, q0, #32 +_NEON2SSE_INLINE float32x4_t vcvtq_n_f32_s32(int32x4_t a, __constrange(1,32) int b) +{ + float convconst; + __m128 cconst128, af; + convconst = (float)(1. / ((uint32_t)1 << b)); + af = _mm_cvtepi32_ps(a); + cconst128 = vdupq_n_f32(convconst); + return _mm_mul_ps(af,cconst128); +} + +float32x4_t vcvtq_n_f32_u32(uint32x4_t a, __constrange(1,32) int b); // VCVT.F32.U32 q0, q0, #32 +_NEON2SSE_INLINE float32x4_t vcvtq_n_f32_u32(uint32x4_t a, __constrange(1,32) int b) +{ + float convconst; + __m128 cconst128, af; + convconst = (float)(1. / (1 << b)); + af = vcvtq_f32_u32(a); + cconst128 = vdupq_n_f32(convconst); + return _mm_mul_ps(af,cconst128); +} + +//**************Convert between floats *********************** +//************************************************************ +float16x4_t vcvt_f16_f32(float32x4_t a); // VCVT.F16.F32 d0, q0 +//Intel SIMD doesn't support 16bits floats curently + +float32x4_t vcvt_f32_f16(float16x4_t a); // VCVT.F32.F16 q0, d0 +//Intel SIMD doesn't support 16bits floats curently, the only solution is to store 16bit floats and load as 32 bits + +//************Vector narrow integer conversion (truncation) ****************** +//**************************************************************************** +int8x8_t vmovn_s16(int16x8_t a); // VMOVN.I16 d0,q0 +_NEON2SSE_INLINE int8x8_t vmovn_s16(int16x8_t a) // VMOVN.I16 d0,q0 +{ + int8x8_t res64; + __m128i res; + _NEON2SSE_ALIGN_16 int8_t mask8_16_even_odd[16] = { 0, 2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15 }; + res = _mm_shuffle_epi8 (a, *(__m128i*) mask8_16_even_odd); //use 64 low bits only + return64(res); +} + +int16x4_t vmovn_s32(int32x4_t a); // VMOVN.I32 d0,q0 +_NEON2SSE_INLINE int16x4_t vmovn_s32(int32x4_t a) // VMOVN.I32 d0,q0 +{ + int16x4_t res64; + __m128i res; + _NEON2SSE_ALIGN_16 int8_t mask8_32_even_odd[16] = { 0,1, 4,5, 8,9, 12,13, 2,3, 6,7,10,11,14,15}; + res = _mm_shuffle_epi8 (a, *(__m128i*) mask8_32_even_odd); //use 64 low bits only + return64(res); +} + +int32x2_t vmovn_s64(int64x2_t a); // VMOVN.I64 d0,q0 +_NEON2SSE_INLINE int32x2_t vmovn_s64(int64x2_t a) +{ + //may be not effective compared with a serial implementation + int32x2_t res64; + __m128i res; + res = _mm_shuffle_epi32 (a, 0 | (2 << 2) | (1 << 4) | (3 << 6)); //use 64 low bits only, _MM_SHUFFLE(3, 1, 2, 0) + return64(res); +} + +uint8x8_t vmovn_u16(uint16x8_t a); // VMOVN.I16 d0,q0 +#define vmovn_u16 vmovn_s16 + +uint16x4_t vmovn_u32(uint32x4_t a); // VMOVN.I32 d0,q0 +#define vmovn_u32 vmovn_s32 + +uint32x2_t vmovn_u64(uint64x2_t a); // VMOVN.I64 d0,q0 +#define vmovn_u64 vmovn_s64 + +//**************** Vector long move *********************** +//*********************************************************** +int16x8_t vmovl_s8(int8x8_t a); // VMOVL.S8 q0,d0 +#define vmovl_s8(a) _MM_CVTEPI8_EPI16(_pM128i(a)) //SSE4.1 + +int32x4_t vmovl_s16(int16x4_t a); // VMOVL.S16 q0,d0 +#define vmovl_s16(a) _MM_CVTEPI16_EPI32(_pM128i(a)) //SSE4.1 + +int64x2_t vmovl_s32(int32x2_t a); // VMOVL.S32 q0,d0 +#define vmovl_s32(a) _MM_CVTEPI32_EPI64(_pM128i(a)) //SSE4.1 + +uint16x8_t vmovl_u8(uint8x8_t a); // VMOVL.U8 q0,d0 +#define vmovl_u8(a) _MM_CVTEPU8_EPI16(_pM128i(a)) //SSE4.1 + +uint32x4_t vmovl_u16(uint16x4_t a); // VMOVL.s16 q0,d0 +#define vmovl_u16(a) _MM_CVTEPU16_EPI32(_pM128i(a)) //SSE4.1 + +uint64x2_t vmovl_u32(uint32x2_t a); // VMOVL.U32 q0,d0 +#define vmovl_u32(a) _MM_CVTEPU32_EPI64(_pM128i(a)) //SSE4.1 + +//*************Vector saturating narrow integer***************** +//************************************************************** +int8x8_t vqmovn_s16(int16x8_t a); // VQMOVN.S16 d0,q0 +_NEON2SSE_INLINE int8x8_t vqmovn_s16(int16x8_t a) +{ + int8x8_t res64; + __m128i res; + res = _mm_packs_epi16(a, a); + return64(res); +} + +int16x4_t vqmovn_s32(int32x4_t a); // VQMOVN.S32 d0,q0 +_NEON2SSE_INLINE int16x4_t vqmovn_s32(int32x4_t a) +{ + int16x4_t res64; + __m128i res; + res = _mm_packs_epi32(a, a); + return64(res); +} + +int32x2_t vqmovn_s64(int64x2_t a); // VQMOVN.S64 d0,q0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x2_t vqmovn_s64(int64x2_t a),_NEON2SSE_REASON_SLOW_SERIAL) //no effective SIMD solution +{ + int32x2_t res; + _NEON2SSE_ALIGN_16 int64_t atmp[2]; + _mm_store_si128((__m128i*)atmp, a); + if(atmp[0]>SINT_MAX) atmp[0] = SINT_MAX; + if(atmp[0]SINT_MAX) atmp[1] = SINT_MAX; + if(atmp[1]32 bits numbers + res_hi = _mm_or_si128(a, mask); + res_hi = _mm_shuffle_epi32(res_hi, 0 | (2 << 2) | (1 << 4) | (3 << 6)); //shuffle the data to get 2 32-bits + return64(res_hi); +} +//************* Vector saturating narrow integer signed->unsigned ************** +//***************************************************************************** +uint8x8_t vqmovun_s16(int16x8_t a); // VQMOVUN.S16 d0,q0 +_NEON2SSE_INLINE uint8x8_t vqmovun_s16(int16x8_t a) +{ + uint8x8_t res64; + __m128i res; + res = _mm_packus_epi16(a, a); //use low 64bits only + return64(res); +} + +uint16x4_t vqmovun_s32(int32x4_t a); // VQMOVUN.S32 d0,q0 +_NEON2SSE_INLINE uint16x4_t vqmovun_s32(int32x4_t a) +{ + uint16x4_t res64; + __m128i res; + res = _MM_PACKUS1_EPI32(a); //use low 64bits only + return64(res); +} + +uint32x2_t vqmovun_s64(int64x2_t a); // VQMOVUN.S64 d0,q0 +_NEON2SSE_INLINE uint32x2_t vqmovun_s64(int64x2_t a) +{ + uint32x2_t res64; + __m128i res_hi,res_lo, zero, cmp; + zero = _mm_setzero_si128(); + res_hi = _mm_srli_epi64(a, 32); + cmp = _mm_cmpgt_epi32(zero, res_hi); //if cmp<0 the result should be zero + res_lo = _mm_andnot_si128(cmp,a); //if cmp zero - do nothing, otherwise cmp <0 and the result is 0 + cmp = _mm_cmpgt_epi32(res_hi,zero); //if cmp positive + res_lo = _mm_or_si128(res_lo, cmp); //if cmp positive we are out of 32bits need to saturaate to 0xffffffff + res_lo = _mm_shuffle_epi32(res_lo, 0 | (2 << 2) | (1 << 4) | (3 << 6)); //shuffle the data to get 2 32-bits + return64(res_lo); +} + +// ******************************************************** +// **************** Table look up ************************** +// ******************************************************** +//VTBL (Vector Table Lookup) uses byte indexes in a control vector to look up byte values +//in a table and generate a new vector. Indexes out of range return 0. +//for Intel SIMD we need to set the MSB to 1 for zero return +uint8x8_t vtbl1_u8(uint8x8_t a, uint8x8_t b); // VTBL.8 d0, {d0}, d0 +_NEON2SSE_INLINE uint8x8_t vtbl1_u8(uint8x8_t a, uint8x8_t b) +{ + uint8x8_t res64; + __m128i c7, maskgt, bmask, b128; + c7 = _mm_set1_epi8 (7); + b128 = _pM128i(b); + maskgt = _mm_cmpgt_epi8(b128,c7); + bmask = _mm_or_si128(b128,maskgt); + bmask = _mm_shuffle_epi8(_pM128i(a),bmask); + return64(bmask); +} + +int8x8_t vtbl1_s8(int8x8_t a, int8x8_t b); // VTBL.8 d0, {d0}, d0 +#define vtbl1_s8 vtbl1_u8 + +poly8x8_t vtbl1_p8(poly8x8_t a, uint8x8_t b); // VTBL.8 d0, {d0}, d0 +#define vtbl1_p8 vtbl1_u8 + +//Special trick to avoid __declspec(align('8')) won't be aligned" error +//uint8x8_t vtbl2_u8(uint8x8x2_t a, uint8x8_t b); // VTBL.8 d0, {d0, d1}, d0 +uint8x8_t vtbl2_u8_ptr(uint8x8x2_t* a, uint8x8_t b); // VTBL.8 d0, {d0, d1}, d0 +_NEON2SSE_INLINE uint8x8_t vtbl2_u8_ptr(uint8x8x2_t* a, uint8x8_t b) +{ + uint8x8_t res64; + __m128i c15, a01, maskgt15, bmask, b128; + c15 = _mm_set1_epi8 (15); + b128 = _pM128i(b); + maskgt15 = _mm_cmpgt_epi8(b128,c15); + bmask = _mm_or_si128(b128, maskgt15); + a01 = _mm_unpacklo_epi64(_pM128i(a->val[0]), _pM128i(a->val[1])); + a01 = _mm_shuffle_epi8(a01, bmask); + return64(a01); +} +#define vtbl2_u8(a, b) vtbl2_u8_ptr(&a, b) + +//int8x8_t vtbl2_s8(int8x8x2_t a, int8x8_t b); // VTBL.8 d0, {d0, d1}, d0 +#define vtbl2_s8 vtbl2_u8 + +//poly8x8_t vtbl2_p8(poly8x8x2_t a, uint8x8_t b); // VTBL.8 d0, {d0, d1}, d0 +#define vtbl2_p8 vtbl2_u8 + +//Special trick to avoid __declspec(align('16')) won't be aligned" error +//uint8x8_t vtbl3_u8(uint8x8x3_t a, uint8x8_t b); // VTBL.8 d0, {d0, d1, d2}, d0 +_NEON2SSE_INLINE uint8x8_t vtbl3_u8_ptr(uint8x8x3_t* a, uint8x8_t b) +{ + //solution may be not optimal + uint8x8_t res64; + __m128i c15, c23, maskgt23, bmask, maskgt15, sh0, sh1, a01, b128; + c15 = _mm_set1_epi8 (15); + c23 = _mm_set1_epi8 (23); + b128 = _pM128i(b); + maskgt23 = _mm_cmpgt_epi8(b128,c23); + bmask = _mm_or_si128(b128, maskgt23); + maskgt15 = _mm_cmpgt_epi8(b128,c15); + a01 = _mm_unpacklo_epi64(_pM128i(a->val[0]),_pM128i(a->val[1])); + sh0 = _mm_shuffle_epi8(a01, bmask); + sh1 = _mm_shuffle_epi8(_pM128i(a->val[2]), bmask); //for bi>15 bi is wrapped (bi-=15) + sh0 = _MM_BLENDV_EPI8(sh0, sh1, maskgt15); //SSE4.1 + return64(sh0); +} +#define vtbl3_u8(a,b) vtbl3_u8_ptr(&a,b) + +//int8x8_t vtbl3_s8(int8x8x3_t a, int8x8_t b); // VTBL.8 d0, {d0, d1, d2}, d0 +int8x8_t vtbl3_s8_ptr(int8x8x3_t* a, int8x8_t b); // VTBL.8 d0, {d0, d1, d2}, d0 +#define vtbl3_s8 vtbl3_u8 + +//poly8x8_t vtbl3_p8(poly8x8x3_t a, uint8x8_t b); // VTBL.8 d0, {d0, d1, d2}, d0 +poly8x8_t vtbl3_p8_ptr(poly8x8x3_t* a, uint8x8_t b); // VTBL.8 d0, {d0, d1, d2}, d0 +#define vtbl3_p8 vtbl3_u8 + +//uint8x8_t vtbl4_u8(uint8x8x4_t a, uint8x8_t b); // VTBL.8 d0, {d0, d1, d2, d3}, d0 +_NEON2SSE_INLINE uint8x8_t vtbl4_u8_ptr(uint8x8x4_t* a, uint8x8_t b) +{ + //solution may be not optimal + uint8x8_t res64; + __m128i c15, c31, maskgt31, bmask, maskgt15, sh0, sh1, a01, a23, b128; + c15 = _mm_set1_epi8 (15); + c31 = _mm_set1_epi8 (31); + b128 = _pM128i(b); + maskgt31 = _mm_cmpgt_epi8(b128,c31); + bmask = _mm_or_si128(b128, maskgt31); + maskgt15 = _mm_cmpgt_epi8(b128,c15); + a01 = _mm_unpacklo_epi64(_pM128i(a->val[0]),_pM128i(a->val[1])); + a23 = _mm_unpacklo_epi64(_pM128i(a->val[2]),_pM128i(a->val[3])); + sh0 = _mm_shuffle_epi8(a01, bmask); + sh1 = _mm_shuffle_epi8(a23, bmask); //for bi>15 bi is wrapped (bi-=15) + sh0 = _MM_BLENDV_EPI8 (sh0, sh1, maskgt15); //SSE4.1 + return64(sh0); +} +#define vtbl4_u8(a,b) vtbl4_u8_ptr(&a,b) + +//int8x8_t vtbl4_s8(int8x8x4_t a, int8x8_t b); // VTBL.8 d0, {d0, d1, d2, d3}, d0 +int8x8_t vtbl4_s8_ptr(int8x8x4_t* a, int8x8_t b); // VTBL.8 d0, {d0, d1, d2, d3}, d0 +#define vtbl4_s8 vtbl4_u8 + +//poly8x8_t vtbl4_p8(poly8x8x4_t a, uint8x8_t b); // VTBL.8 d0, {d0, d1, d2, d3}, d0 +poly8x8_t vtbl4_p8_ptr(poly8x8x4_t* a, uint8x8_t b); // VTBL.8 d0, {d0, d1, d2, d3}, d0 +#define vtbl4_p8 vtbl4_u8 + +//****************** Extended table look up intrinsics *************************** +//********************************************************************************** +//VTBX (Vector Table Extension) works in the same way as VTBL do, +// except that indexes out of range leave the destination element unchanged. + +uint8x8_t vtbx1_u8(uint8x8_t a, uint8x8_t b, uint8x8_t c); // VTBX.8 d0, {d0}, d0 +_NEON2SSE_INLINE uint8x8_t vtbx1_u8(uint8x8_t a, uint8x8_t b, uint8x8_t c) +{ + uint8x8_t res64; + __m128i c7, maskgt, sh, c128; + c7 = _mm_set1_epi8 (7); + c128 = _pM128i(c); + maskgt = _mm_cmpgt_epi8(c128,c7); + c7 = _mm_and_si128(maskgt,_pM128i(a)); + sh = _mm_shuffle_epi8(_pM128i(b),c128); + sh = _mm_andnot_si128(maskgt,sh); + sh = _mm_or_si128(sh,c7); + return64(sh); +} + +int8x8_t vtbx1_s8(int8x8_t a, int8x8_t b, int8x8_t c); // VTBX.8 d0, {d0}, d0 +#define vtbx1_s8 vtbx1_u8 + +poly8x8_t vtbx1_p8(poly8x8_t a, poly8x8_t b, uint8x8_t c); // VTBX.8 d0, {d0}, d0 +#define vtbx1_p8 vtbx1_u8 + +//Special trick to avoid __declspec(align('8')) won't be aligned" error +//uint8x8_t vtbx2_u8(uint8x8_t a, uint8x8x2_t b, uint8x8_t c); // VTBX.8 d0, {d0, d1}, d0 +uint8x8_t vtbx2_u8_ptr(uint8x8_t a, uint8x8x2_t* b, uint8x8_t c); // VTBX.8 d0, {d0, d1}, d0 +_NEON2SSE_INLINE uint8x8_t vtbx2_u8_ptr(uint8x8_t a, uint8x8x2_t* b, uint8x8_t c) +{ + uint8x8_t res64; + __m128i c15, b01, maskgt15, sh, c128; + c15 = _mm_set1_epi8 (15); + c128 = _pM128i(c); + maskgt15 = _mm_cmpgt_epi8(c128, c15); + c15 = _mm_and_si128(maskgt15, _pM128i(a)); + b01 = _mm_unpacklo_epi64(_pM128i(b->val[0]), _pM128i(b->val[1])); + sh = _mm_shuffle_epi8(b01, c128); + sh = _mm_andnot_si128(maskgt15, sh); + sh = _mm_or_si128(sh,c15); + return64(sh); +} +#define vtbx2_u8(a, b, c) vtbx2_u8_ptr(a, &b, c) + +//int8x8_t vtbx2_s8(int8x8_t a, int8x8x2_t b, int8x8_t c); // VTBX.8 d0, {d0, d1}, d0 +#define vtbx2_s8 vtbx2_u8 + +//poly8x8_t vtbx2_p8(poly8x8_t a, poly8x8x2_t b, uint8x8_t c); // VTBX.8 d0, {d0, d1}, d0 +#define vtbx2_p8 vtbx2_u8 + +//uint8x8_t vtbx3_u8(uint8x8_t a, uint8x8x3_t b, uint8x8_t c) // VTBX.8 d0, {d0, d1, d2}, d0 +_NEON2SSE_INLINE uint8x8_t vtbx3_u8_ptr(uint8x8_t a, uint8x8x3_t* b, uint8x8_t c) +{ + //solution may be not optimal + uint8x8_t res64; + __m128i c15, c23, maskgt15, maskgt23, sh0, sh1, b01, c128; + c15 = _mm_set1_epi8 (15); + c23 = _mm_set1_epi8 (23); + c128 = _pM128i(c); + maskgt15 = _mm_cmpgt_epi8(c128,c15); + maskgt23 = _mm_cmpgt_epi8(c128,c23); + c23 = _mm_and_si128(maskgt23, _pM128i(a)); + b01 = _mm_unpacklo_epi64(_pM128i(b->val[0]),_pM128i(b->val[1])); + sh0 = _mm_shuffle_epi8(b01, c128); + sh1 = _mm_shuffle_epi8(_pM128i(b->val[2]), c128); //for bi>15 bi is wrapped (bi-=15) + sh0 = _MM_BLENDV_EPI8(sh0, sh1, maskgt15); + sh0 = _mm_andnot_si128(maskgt23,sh0); + sh0 = _mm_or_si128(sh0,c23); + return64(sh0); +} +#define vtbx3_u8(a, b, c) vtbx3_u8_ptr(a, &b, c) + +//int8x8_t vtbx3_s8(int8x8_t a, int8x8x3_t b, int8x8_t c); // VTBX.8 d0, {d0, d1, d2}, d0 +int8x8_t vtbx3_s8_ptr(int8x8_t a, int8x8x3_t* b, int8x8_t c); +#define vtbx3_s8 vtbx3_u8 + +//poly8x8_t vtbx3_p8(poly8x8_t a, poly8x8x3_t b, uint8x8_t c); // VTBX.8 d0, {d0, d1, d2}, d0 +poly8x8_t vtbx3_p8_ptr(poly8x8_t a, poly8x8x3_t* b, uint8x8_t c); +#define vtbx3_p8 vtbx3_u8 + +//uint8x8_t vtbx4_u8(uint8x8_t a, uint8x8x4_t b, uint8x8_t c) // VTBX.8 d0, {d0, d1, d2, d3}, d0 +_NEON2SSE_INLINE uint8x8_t vtbx4_u8_ptr(uint8x8_t a, uint8x8x4_t* b, uint8x8_t c) +{ + //solution may be not optimal + uint8x8_t res64; + __m128i c15, c31, maskgt15, maskgt31, sh0, sh1, b01, b23, c128; + c15 = _mm_set1_epi8 (15); + c31 = _mm_set1_epi8 (31); + c128 = _pM128i(c); + maskgt15 = _mm_cmpgt_epi8(c128,c15); + maskgt31 = _mm_cmpgt_epi8(c128,c31); + c31 = _mm_and_si128(maskgt31, _pM128i(a)); + + b01 = _mm_unpacklo_epi64(_pM128i(b->val[0]),_pM128i(b->val[1])); + b23 = _mm_unpacklo_epi64(_pM128i(b->val[2]),_pM128i(b->val[3])); + sh0 = _mm_shuffle_epi8(b01, c128); + sh1 = _mm_shuffle_epi8(b23, c128); //for bi>15 bi is wrapped (bi-=15) + sh0 = _MM_BLENDV_EPI8(sh0, sh1, maskgt15); + sh0 = _mm_andnot_si128(maskgt31,sh0); + sh0 = _mm_or_si128(sh0,c31); + return64(sh0); +} +#define vtbx4_u8(a, b, c) vtbx4_u8_ptr(a, &b, c) + +//int8x8_t vtbx4_s8(int8x8_t a, int8x8x4_t b, int8x8_t c); // VTBX.8 d0, {d0, d1, d2, d3}, d0 +int8x8_t vtbx4_s8_ptr(int8x8_t a, int8x8x4_t* b, int8x8_t c); +#define vtbx4_s8 vtbx4_u8 + +//poly8x8_t vtbx4_p8(poly8x8_t a, poly8x8x4_t b, uint8x8_t c); // VTBX.8 d0, {d0, d1, d2, d3}, d0 +poly8x8_t vtbx4_p8_ptr(poly8x8_t a, poly8x8x4_t* b, uint8x8_t c); +#define vtbx4_p8 vtbx4_u8 + +//************************************************************************************************* +// *************************** Operations with a scalar value ********************************* +//************************************************************************************************* + +//******* Vector multiply accumulate by scalar ************************************************* +//********************************************************************************************** +int16x4_t vmla_lane_s16(int16x4_t a, int16x4_t b, int16x4_t v, __constrange(0,3) int l); // VMLA.I16 d0, d0, d0[0] +_NEON2SSE_INLINE int16x4_t vmla_lane_s16(int16x4_t a, int16x4_t b, int16x4_t v, __constrange(0,3) int l) // VMLA.I16 d0, d0, d0[0] +{ + int16_t c; + int16x4_t scalar; + c = vget_lane_s16(v, l); + scalar = vdup_n_s16(c); + return vmla_s16(a, b, scalar); +} + +int32x2_t vmla_lane_s32(int32x2_t a, int32x2_t b, int32x2_t v, __constrange(0,1) int l); // VMLA.I32 d0, d0, d0[0] +_NEON2SSE_INLINE int32x2_t vmla_lane_s32(int32x2_t a, int32x2_t b, int32x2_t v, __constrange(0,1) int l) // VMLA.I32 d0, d0, d0[0] +{ + int32_t c; + int32x2_t scalar; + c = vget_lane_s32(v, l); + scalar = vdup_n_s32(c); + return vmla_s32(a, b, scalar); +} + +uint16x4_t vmla_lane_u16(uint16x4_t a, uint16x4_t b, uint16x4_t v, __constrange(0,3) int l); // VMLA.I16 d0, d0, d0[0] +#define vmla_lane_u16 vmla_lane_s16 + + +uint32x2_t vmla_lane_u32(uint32x2_t a, uint32x2_t b, uint32x2_t v, __constrange(0,1) int l); // VMLA.I32 d0, d0, d0[0] +#define vmla_lane_u32 vmla_lane_s32 + +float32x2_t vmla_lane_f32(float32x2_t a, float32x2_t b, float32x2_t v, __constrange(0,1) int l); // VMLA.F32 d0, d0, d0[0] +_NEON2SSE_INLINE float32x2_t vmla_lane_f32(float32x2_t a, float32x2_t b, float32x2_t v, __constrange(0,1) int l) +{ + float32_t vlane; + float32x2_t c; + vlane = vget_lane_f32(v, l); + c = vdup_n_f32(vlane); + return vmla_f32(a,b,c); +} + +int16x8_t vmlaq_lane_s16(int16x8_t a, int16x8_t b, int16x4_t v, __constrange(0,3) int l); // VMLA.I16 q0, q0, d0[0] +_NEON2SSE_INLINE int16x8_t vmlaq_lane_s16(int16x8_t a, int16x8_t b, int16x4_t v, __constrange(0,3) int l) // VMLA.I16 q0, q0, d0[0] +{ + int16_t vlane; + int16x8_t c; + vlane = vget_lane_s16(v, l); + c = vdupq_n_s16(vlane); + return vmlaq_s16(a,b,c); +} + +int32x4_t vmlaq_lane_s32(int32x4_t a, int32x4_t b, int32x2_t v, __constrange(0,1) int l); // VMLA.I32 q0, q0, d0[0] +_NEON2SSE_INLINE int32x4_t vmlaq_lane_s32(int32x4_t a, int32x4_t b, int32x2_t v, __constrange(0,1) int l) // VMLA.I32 q0, q0, d0[0] +{ + int32_t vlane; + int32x4_t c; + vlane = vget_lane_s32(v, l); + c = vdupq_n_s32(vlane); + return vmlaq_s32(a,b,c); +} + +uint16x8_t vmlaq_lane_u16(uint16x8_t a, uint16x8_t b, uint16x4_t v, __constrange(0,3) int l); // VMLA.I16 q0, q0, d0[0] +#define vmlaq_lane_u16 vmlaq_lane_s16 + +uint32x4_t vmlaq_lane_u32(uint32x4_t a, uint32x4_t b, uint32x2_t v, __constrange(0,1) int l); // VMLA.I32 q0, q0, d0[0] +#define vmlaq_lane_u32 vmlaq_lane_s32 + +float32x4_t vmlaq_lane_f32(float32x4_t a, float32x4_t b, float32x2_t v, __constrange(0,1) int l); // VMLA.F32 q0, q0, d0[0] +_NEON2SSE_INLINE float32x4_t vmlaq_lane_f32(float32x4_t a, float32x4_t b, float32x2_t v, __constrange(0,1) int l) // VMLA.F32 q0, q0, d0[0] +{ + float32_t vlane; + float32x4_t c; + vlane = vget_lane_f32(v, l); + c = vdupq_n_f32(vlane); + return vmlaq_f32(a,b,c); +} + +//***************** Vector widening multiply accumulate by scalar ********************** +//*************************************************************************************** +int32x4_t vmlal_lane_s16(int32x4_t a, int16x4_t b, int16x4_t v, __constrange(0,3) int l); // VMLAL.S16 q0, d0, d0[0] +_NEON2SSE_INLINE int32x4_t vmlal_lane_s16(int32x4_t a, int16x4_t b, int16x4_t v, __constrange(0,3) int l) // VMLAL.S16 q0, d0, d0[0] +{ + int16_t vlane; + int16x4_t c; + vlane = vget_lane_s16(v, l); + c = vdup_n_s16(vlane); + return vmlal_s16(a, b, c); +} + +int64x2_t vmlal_lane_s32(int64x2_t a, int32x2_t b, int32x2_t v, __constrange(0,1) int l); // VMLAL.S32 q0, d0, d0[0] +_NEON2SSE_INLINE int64x2_t vmlal_lane_s32(int64x2_t a, int32x2_t b, int32x2_t v, __constrange(0,1) int l) // VMLAL.S32 q0, d0, d0[0] +{ + int32_t vlane; + int32x2_t c; + vlane = vget_lane_s32(v, l); + c = vdup_n_s32(vlane); + return vmlal_s32(a, b, c); +} + +uint32x4_t vmlal_lane_u16(uint32x4_t a, uint16x4_t b, uint16x4_t v, __constrange(0,3) int l); // VMLAL.s16 q0, d0, d0[0] +_NEON2SSE_INLINE uint32x4_t vmlal_lane_u16(uint32x4_t a, uint16x4_t b, uint16x4_t v, __constrange(0,3) int l) // VMLAL.s16 q0, d0, d0[0] +{ + uint16_t vlane; + uint16x4_t c; + vlane = vget_lane_u16(v, l); + c = vdup_n_u16(vlane); + return vmlal_u16(a, b, c); +} + +uint64x2_t vmlal_lane_u32(uint64x2_t a, uint32x2_t b, uint32x2_t v, __constrange(0,1) int l); // VMLAL.U32 q0, d0, d0[0] +_NEON2SSE_INLINE uint64x2_t vmlal_lane_u32(uint64x2_t a, uint32x2_t b, uint32x2_t v, __constrange(0,1) int l) // VMLAL.U32 q0, d0, d0[0] +{ + uint32_t vlane; + uint32x2_t c; + vlane = vget_lane_u32(v, l); + c = vdup_n_u32(vlane); + return vmlal_u32(a, b, c); +} + +// ******** Vector widening saturating doubling multiply accumulate by scalar ******************************* +// ************************************************************************************************ +int32x4_t vqdmlal_lane_s16(int32x4_t a, int16x4_t b, int16x4_t v, __constrange(0,3) int l); // VQDMLAL.S16 q0, d0, d0[0] +_NEON2SSE_INLINE int32x4_t vqdmlal_lane_s16(int32x4_t a, int16x4_t b, int16x4_t v, __constrange(0,3) int l) +{ + int16_t vlane; + int16x4_t c; + vlane = vget_lane_s16(v, l); + c = vdup_n_s16(vlane); + return vqdmlal_s16(a, b, c); +} + +int64x2_t vqdmlal_lane_s32(int64x2_t a, int32x2_t b, int32x2_t v, __constrange(0,1) int l); // VQDMLAL.S32 q0, d0, d0[0] +_NEON2SSE_INLINE int64x2_t vqdmlal_lane_s32(int64x2_t a, int32x2_t b, int32x2_t v, __constrange(0,1) int l) +{ + int32_t vlane; + uint32x2_t c; + vlane = vget_lane_s32(v, l); + c = vdup_n_s32(vlane); + return vqdmlal_s32(a, b, c); +} + +// ****** Vector multiply subtract by scalar ***************** +// ************************************************************* +int16x4_t vmls_lane_s16(int16x4_t a, int16x4_t b, int16x4_t v, __constrange(0,3) int l); // VMLS.I16 d0, d0, d0[0] +_NEON2SSE_INLINE int16x4_t vmls_lane_s16(int16x4_t a, int16x4_t b, int16x4_t v, __constrange(0,3) int l) // VMLS.I16 d0, d0, d0[0] +{ + int16_t vlane; + int16x4_t c; + vlane = vget_lane_s16(v, l); + c = vdup_n_s16(vlane); + return vmls_s16(a, b, c); +} + +int32x2_t vmls_lane_s32(int32x2_t a, int32x2_t b, int32x2_t v, __constrange(0,1) int l); // VMLS.I32 d0, d0, d0[0] +_NEON2SSE_INLINE int32x2_t vmls_lane_s32(int32x2_t a, int32x2_t b, int32x2_t v, __constrange(0,1) int l) // VMLS.I32 d0, d0, d0[0] +{ + int32_t vlane; + int32x2_t c; + vlane = vget_lane_s32(v, l); + c = vdup_n_s32(vlane); + return vmls_s32(a, b, c); +} + +uint16x4_t vmls_lane_u16(uint16x4_t a, uint16x4_t b, uint16x4_t v, __constrange(0,3) int l); // VMLS.I16 d0, d0, d0[0] +_NEON2SSE_INLINE uint16x4_t vmls_lane_u16(uint16x4_t a, uint16x4_t b, uint16x4_t v, __constrange(0,3) int l) // VMLS.I16 d0, d0, d0[0] +{ + uint16_t vlane; + uint16x4_t c; + vlane = vget_lane_s16(v, l); + c = vdup_n_s16(vlane); + return vmls_s16(a, b, c); +} + +uint32x2_t vmls_lane_u32(uint32x2_t a, uint32x2_t b, uint32x2_t v, __constrange(0,1) int l); // VMLS.I32 d0, d0, d0[0] +_NEON2SSE_INLINE uint32x2_t vmls_lane_u32(uint32x2_t a, uint32x2_t b, uint32x2_t v, __constrange(0,1) int l) // VMLS.I32 d0, d0, d0[0] +{ + uint32_t vlane; + uint32x2_t c; + vlane = vget_lane_u32(v, l); + c = vdup_n_u32(vlane); + return vmls_u32(a, b, c); +} + +float32x2_t vmls_lane_f32(float32x2_t a, float32x2_t b, float32x2_t v, __constrange(0,1) int l); // VMLS.F32 d0, d0, d0[0] +_NEON2SSE_INLINE float32x2_t vmls_lane_f32(float32x2_t a, float32x2_t b, float32x2_t v, __constrange(0,1) int l) +{ + float32_t vlane; + float32x2_t c; + vlane = (float) vget_lane_f32(v, l); + c = vdup_n_f32(vlane); + return vmls_f32(a,b,c); +} + +int16x8_t vmlsq_lane_s16(int16x8_t a, int16x8_t b, int16x4_t v, __constrange(0,3) int l); // VMLS.I16 q0, q0, d0[0] +_NEON2SSE_INLINE int16x8_t vmlsq_lane_s16(int16x8_t a, int16x8_t b, int16x4_t v, __constrange(0,3) int l) // VMLS.I16 q0, q0, d0[0] +{ + int16_t vlane; + int16x8_t c; + vlane = vget_lane_s16(v, l); + c = vdupq_n_s16(vlane); + return vmlsq_s16(a, b,c); +} + +int32x4_t vmlsq_lane_s32(int32x4_t a, int32x4_t b, int32x2_t v, __constrange(0,1) int l); // VMLS.I32 q0, q0, d0[0] +_NEON2SSE_INLINE int32x4_t vmlsq_lane_s32(int32x4_t a, int32x4_t b, int32x2_t v, __constrange(0,1) int l) // VMLS.I32 q0, q0, d0[0] +{ + int32_t vlane; + int32x4_t c; + vlane = vget_lane_s32(v, l); + c = vdupq_n_s32(vlane); + return vmlsq_s32(a,b,c); +} + +uint16x8_t vmlsq_lane_u16(uint16x8_t a, uint16x8_t b, uint16x4_t v, __constrange(0,3) int l); // VMLA.I16 q0, q0, d0[0] +_NEON2SSE_INLINE uint16x8_t vmlsq_lane_u16(uint16x8_t a, uint16x8_t b, uint16x4_t v, __constrange(0,3) int l) // VMLA.I16 q0, q0, d0[0] +{ + uint16_t vlane; + uint16x8_t c; + vlane = vget_lane_u16(v, l); + c = vdupq_n_u16(vlane); + return vmlsq_u16(a,b,c); +} + +uint32x4_t vmlsq_lane_u32(uint32x4_t a, uint32x4_t b, uint32x2_t v, __constrange(0,1) int l); // VMLA.I32 q0, q0, d0[0] +_NEON2SSE_INLINE uint32x4_t vmlsq_lane_u32(uint32x4_t a, uint32x4_t b, uint32x2_t v, __constrange(0,1) int l) // VMLA.I32 q0, q0, d0[0] +{ + uint32_t vlane; + uint32x4_t c; + vlane = vget_lane_u32(v, l); + c = vdupq_n_u32(vlane); + return vmlsq_u32(a,b,c); +} + +float32x4_t vmlsq_lane_f32(float32x4_t a, float32x4_t b, float32x2_t v, __constrange(0,1) int l); // VMLA.F32 q0, q0, d0[0] +_NEON2SSE_INLINE float32x4_t vmlsq_lane_f32(float32x4_t a, float32x4_t b, float32x2_t v, __constrange(0,1) int l) // VMLA.F32 q0, q0, d0[0] +{ + float32_t vlane; + float32x4_t c; + vlane = (float) vget_lane_f32(v, l); + c = vdupq_n_f32(vlane); + return vmlsq_f32(a,b,c); +} + +// **** Vector widening multiply subtract by scalar **** +// **************************************************** +int32x4_t vmlsl_lane_s16(int32x4_t a, int16x4_t b, int16x4_t v, __constrange(0,3) int l); // VMLAL.S16 q0, d0, d0[0] +_NEON2SSE_INLINE int32x4_t vmlsl_lane_s16(int32x4_t a, int16x4_t b, int16x4_t v, __constrange(0,3) int l) // VMLAL.S16 q0, d0, d0[0] +{ + int16_t vlane; + int16x4_t c; + vlane = vget_lane_s16(v, l); + c = vdup_n_s16(vlane); + return vmlsl_s16(a, b, c); +} + +int64x2_t vmlsl_lane_s32(int64x2_t a, int32x2_t b, int32x2_t v, __constrange(0,1) int l); // VMLAL.S32 q0, d0, d0[0] +_NEON2SSE_INLINE int64x2_t vmlsl_lane_s32(int64x2_t a, int32x2_t b, int32x2_t v, __constrange(0,1) int l) // VMLAL.S32 q0, d0, d0[0] +{ + int32_t vlane; + int32x2_t c; + vlane = vget_lane_s32(v, l); + c = vdup_n_s32(vlane); + return vmlsl_s32(a, b, c); +} + +uint32x4_t vmlsl_lane_u16(uint32x4_t a, uint16x4_t b, uint16x4_t v, __constrange(0,3) int l); // VMLAL.s16 q0, d0, d0[0] +_NEON2SSE_INLINE uint32x4_t vmlsl_lane_u16(uint32x4_t a, uint16x4_t b, uint16x4_t v, __constrange(0,3) int l) // VMLAL.s16 q0, d0, d0[0] +{ + uint16_t vlane; + uint16x4_t c; + vlane = vget_lane_s16(v, l); + c = vdup_n_s16(vlane); + return vmlsl_s16(a, b, c); +} + +uint64x2_t vmlsl_lane_u32(uint64x2_t a, uint32x2_t b, uint32x2_t v, __constrange(0,1) int l); // VMLAL.U32 q0, d0, d0[0] +_NEON2SSE_INLINE uint64x2_t vmlsl_lane_u32(uint64x2_t a, uint32x2_t b, uint32x2_t v, __constrange(0,1) int l) // VMLAL.U32 q0, d0, d0[0] +{ + uint32_t vlane; + uint32x2_t c; + vlane = vget_lane_u32(v, l); + c = vdup_n_u32(vlane); + return vmlsl_u32(a, b, c); +} + +//********* Vector widening saturating doubling multiply subtract by scalar ************************** +//****************************************************************************************************** +int32x4_t vqdmlsl_lane_s16(int32x4_t a, int16x4_t b, int16x4_t v, __constrange(0,3) int l); // VQDMLSL.S16 q0, d0, d0[0] +_NEON2SSE_INLINE int32x4_t vqdmlsl_lane_s16(int32x4_t a, int16x4_t b, int16x4_t v, __constrange(0,3) int l) +{ + int16_t vlane; + int16x4_t c; + vlane = vget_lane_s16(v, l); + c = vdup_n_s16(vlane); + return vqdmlsl_s16(a, b, c); +} + +int64x2_t vqdmlsl_lane_s32(int64x2_t a, int32x2_t b, int32x2_t v, __constrange(0,1) int l); // VQDMLSL.S32 q0, d0, d0[0] +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x2_t vqdmlsl_lane_s32(int64x2_t a, int32x2_t b, int32x2_t v, __constrange(0,1) int l), _NEON2SSE_REASON_SLOW_SERIAL) +{ + int32_t vlane; + int32x2_t c; + vlane = vget_lane_s32(v, l); + c = vdup_n_s32(vlane); + return vqdmlsl_s32(a, b, c); +} +//********** Vector multiply with scalar ***************************** +int16x4_t vmul_n_s16(int16x4_t a, int16_t b); // VMUL.I16 d0,d0,d0[0] +_NEON2SSE_INLINE int16x4_t vmul_n_s16(int16x4_t a, int16_t b) // VMUL.I16 d0,d0,d0[0] +{ + int16x4_t b16x4; + b16x4 = vdup_n_s16(b); + return vmul_s16(a, b16x4); +} + +int32x2_t vmul_n_s32(int32x2_t a, int32_t b); // VMUL.I32 d0,d0,d0[0] +_NEON2SSE_INLINE int32x2_t vmul_n_s32(int32x2_t a, int32_t b) // VMUL.I32 d0,d0,d0[0] +{ + //serial solution looks faster + int32x2_t b32x2; + b32x2 = vdup_n_s32(b); + return vmul_s32(a, b32x2); +} + +float32x2_t vmul_n_f32(float32x2_t a, float32_t b); // VMUL.F32 d0,d0,d0[0] +_NEON2SSE_INLINE float32x2_t vmul_n_f32(float32x2_t a, float32_t b) // VMUL.F32 d0,d0,d0[0] +{ + float32x2_t b32x2; + b32x2 = vdup_n_f32(b); + return vmul_f32(a, b32x2); +} + +uint16x4_t vmul_n_u16(uint16x4_t a, uint16_t b); // VMUL.I16 d0,d0,d0[0] +_NEON2SSE_INLINE uint16x4_t vmul_n_u16(uint16x4_t a, uint16_t b) // VMUL.I16 d0,d0,d0[0] +{ + uint16x4_t b16x4; + b16x4 = vdup_n_s16(b); + return vmul_s16(a, b16x4); +} + +uint32x2_t vmul_n_u32(uint32x2_t a, uint32_t b); // VMUL.I32 d0,d0,d0[0] +_NEON2SSE_INLINE uint32x2_t vmul_n_u32(uint32x2_t a, uint32_t b) // VMUL.I32 d0,d0,d0[0] +{ + //serial solution looks faster + uint32x2_t b32x2; + b32x2 = vdup_n_u32(b); + return vmul_u32(a, b32x2); +} + +int16x8_t vmulq_n_s16(int16x8_t a, int16_t b); // VMUL.I16 q0,q0,d0[0] +_NEON2SSE_INLINE int16x8_t vmulq_n_s16(int16x8_t a, int16_t b) // VMUL.I16 q0,q0,d0[0] +{ + int16x8_t b16x8; + b16x8 = vdupq_n_s16(b); + return vmulq_s16(a, b16x8); +} + +int32x4_t vmulq_n_s32(int32x4_t a, int32_t b); // VMUL.I32 q0,q0,d0[0] +_NEON2SSE_INLINE int32x4_t vmulq_n_s32(int32x4_t a, int32_t b) // VMUL.I32 q0,q0,d0[0] +{ + int32x4_t b32x4; + b32x4 = vdupq_n_s32(b); + return vmulq_s32(a, b32x4); +} + +float32x4_t vmulq_n_f32(float32x4_t a, float32_t b); // VMUL.F32 q0,q0,d0[0] +_NEON2SSE_INLINE float32x4_t vmulq_n_f32(float32x4_t a, float32_t b) // VMUL.F32 q0,q0,d0[0] +{ + float32x4_t b32x4; + b32x4 = vdupq_n_f32(b); + return vmulq_f32(a, b32x4); +} + +uint16x8_t vmulq_n_u16(uint16x8_t a, uint16_t b); // VMUL.I16 q0,q0,d0[0] +_NEON2SSE_INLINE uint16x8_t vmulq_n_u16(uint16x8_t a, uint16_t b) // VMUL.I16 q0,q0,d0[0] +{ + uint16x8_t b16x8; + b16x8 = vdupq_n_s16(b); + return vmulq_s16(a, b16x8); +} + +uint32x4_t vmulq_n_u32(uint32x4_t a, uint32_t b); // VMUL.I32 q0,q0,d0[0] +_NEON2SSE_INLINE uint32x4_t vmulq_n_u32(uint32x4_t a, uint32_t b) // VMUL.I32 q0,q0,d0[0] +{ + uint32x4_t b32x4; + b32x4 = vdupq_n_u32(b); + return vmulq_u32(a, b32x4); +} + +//********** Vector multiply lane ***************************** +int16x4_t vmul_lane_s16 (int16x4_t a, int16x4_t b, __constrange(0,3) int c); +_NEON2SSE_INLINE int16x4_t vmul_lane_s16 (int16x4_t a, int16x4_t b, __constrange(0,3) int c) +{ + int16x4_t b16x4; + int16_t vlane; + vlane = vget_lane_s16(b, c); + b16x4 = vdup_n_s16(vlane); + return vmul_s16(a, b16x4); +} + +int32x2_t vmul_lane_s32 (int32x2_t a, int32x2_t b, __constrange(0,1) int c); +_NEON2SSE_INLINE int32x2_t vmul_lane_s32 (int32x2_t a, int32x2_t b, __constrange(0,1) int c) +{ + int32x2_t b32x2; + int32_t vlane; + vlane = vget_lane_s32(b, c); + b32x2 = vdup_n_s32(vlane); + return vmul_s32(a, b32x2); +} + +float32x2_t vmul_lane_f32 (float32x2_t a, float32x2_t b, __constrange(0,1) int c); +_NEON2SSE_INLINE float32x2_t vmul_lane_f32 (float32x2_t a, float32x2_t b, __constrange(0,1) int c) +{ + float32x2_t b32x2; + float32_t vlane; + vlane = vget_lane_f32(b, c); + b32x2 = vdup_n_f32(vlane); + return vmul_f32(a, b32x2); +} + +uint16x4_t vmul_lane_u16 (uint16x4_t a, uint16x4_t b, __constrange(0,3) int c); +#define vmul_lane_u16 vmul_lane_s16 + +uint32x2_t vmul_lane_u32 (uint32x2_t a, uint32x2_t b, __constrange(0,1) int c); +#define vmul_lane_u32 vmul_lane_s32 + +int16x8_t vmulq_lane_s16(int16x8_t a, int16x4_t b, __constrange(0,3) int c); +_NEON2SSE_INLINE int16x8_t vmulq_lane_s16 (int16x8_t a, int16x4_t b, __constrange(0,3) int c) +{ + int16x8_t b16x8; + int16_t vlane; + vlane = vget_lane_s16(b, c); + b16x8 = vdupq_n_s16(vlane); + return vmulq_s16(a, b16x8); +} + +int32x4_t vmulq_lane_s32 (int32x4_t a, int32x2_t b, __constrange(0,1) int c); +_NEON2SSE_INLINE int32x4_t vmulq_lane_s32 (int32x4_t a, int32x2_t b, __constrange(0,1) int c) +{ + int32x4_t b32x4; + int32_t vlane; + vlane = vget_lane_s32(b, c); + b32x4 = vdupq_n_s32(vlane); + return vmulq_s32(a, b32x4); +} + +float32x4_t vmulq_lane_f32 (float32x4_t a, float32x2_t b, __constrange(0,1) int c); +_NEON2SSE_INLINE float32x4_t vmulq_lane_f32 (float32x4_t a, float32x2_t b, __constrange(0,1) int c) +{ + float32x4_t b32x4; + float32_t vlane; + vlane = vget_lane_f32(b, c); + b32x4 = vdupq_n_f32(vlane); + return vmulq_f32(a, b32x4); +} + +uint16x8_t vmulq_lane_u16 (uint16x8_t a, uint16x4_t b, __constrange(0,3) int c); +#define vmulq_lane_u16 vmulq_lane_s16 + +uint32x4_t vmulq_lane_u32 (uint32x4_t a, uint32x2_t b, __constrange(0,1) int c); +#define vmulq_lane_u32 vmulq_lane_s32 + +//**** Vector long multiply with scalar ************ +int32x4_t vmull_n_s16(int16x4_t vec1, int16_t val2); // VMULL.S16 q0,d0,d0[0] +_NEON2SSE_INLINE int32x4_t vmull_n_s16(int16x4_t vec1, int16_t val2) // VMULL.S16 q0,d0,d0[0] +{ + int16x4_t b16x4; + b16x4 = vdup_n_s16(val2); + return vmull_s16(vec1, b16x4); +} + +int64x2_t vmull_n_s32(int32x2_t vec1, int32_t val2); // VMULL.S32 q0,d0,d0[0] +_NEON2SSE_INLINE int64x2_t vmull_n_s32(int32x2_t vec1, int32_t val2) // VMULL.S32 q0,d0,d0[0] +{ + int32x2_t b32x2; + b32x2 = vdup_n_s32(val2); + return vmull_s32(vec1, b32x2); +} + +uint32x4_t vmull_n_u16(uint16x4_t vec1, uint16_t val2); // VMULL.s16 q0,d0,d0[0] +_NEON2SSE_INLINE uint32x4_t vmull_n_u16(uint16x4_t vec1, uint16_t val2) // VMULL.s16 q0,d0,d0[0] +{ + uint16x4_t b16x4; + b16x4 = vdup_n_s16(val2); + return vmull_s16(vec1, b16x4); +} + +uint64x2_t vmull_n_u32(uint32x2_t vec1, uint32_t val2); // VMULL.U32 q0,d0,d0[0] +_NEON2SSE_INLINE uint64x2_t vmull_n_u32(uint32x2_t vec1, uint32_t val2) // VMULL.U32 q0,d0,d0[0] +{ + uint32x2_t b32x2; + b32x2 = vdup_n_u32(val2); + return vmull_u32(vec1, b32x2); +} + +//**** Vector long multiply by scalar **** +int32x4_t vmull_lane_s16(int16x4_t vec1, int16x4_t val2, __constrange(0, 3) int val3); // VMULL.S16 q0,d0,d0[0] +_NEON2SSE_INLINE int32x4_t vmull_lane_s16(int16x4_t vec1, int16x4_t val2, __constrange(0, 3) int val3) // VMULL.S16 q0,d0,d0[0] +{ + int16_t vlane; + int16x4_t b; + vlane = vget_lane_s16(val2, val3); + b = vdup_n_s16(vlane); + return vmull_s16(vec1, b); +} + +int64x2_t vmull_lane_s32(int32x2_t vec1, int32x2_t val2, __constrange(0, 1) int val3); // VMULL.S32 q0,d0,d0[0] +_NEON2SSE_INLINE int64x2_t vmull_lane_s32(int32x2_t vec1, int32x2_t val2, __constrange(0, 1) int val3) // VMULL.S32 q0,d0,d0[0] +{ + int32_t vlane; + int32x2_t b; + vlane = vget_lane_s32(val2, val3); + b = vdup_n_s32(vlane); + return vmull_s32(vec1, b); +} + +uint32x4_t vmull_lane_u16(uint16x4_t vec1, uint16x4_t val2, __constrange(0, 3) int val3); // VMULL.s16 q0,d0,d0[0] +_NEON2SSE_INLINE uint32x4_t vmull_lane_u16(uint16x4_t vec1, uint16x4_t val2, __constrange(0, 3) int val3) // VMULL.s16 q0,d0,d0[0] +{ + uint16_t vlane; + uint16x4_t b; + vlane = vget_lane_s16(val2, val3); + b = vdup_n_s16(vlane); + return vmull_s16(vec1, b); +} + +uint64x2_t vmull_lane_u32(uint32x2_t vec1, uint32x2_t val2, __constrange(0, 1) int val3); // VMULL.U32 q0,d0,d0[0] +_NEON2SSE_INLINE uint64x2_t vmull_lane_u32(uint32x2_t vec1, uint32x2_t val2, __constrange(0, 1) int val3) // VMULL.U32 q0,d0,d0[0] +{ + uint32_t vlane; + uint32x2_t b; + vlane = vget_lane_u32(val2, val3); + b = vdup_n_u32(vlane); + return vmull_u32(vec1, b); +} + +//********* Vector saturating doubling long multiply with scalar ******************* +int32x4_t vqdmull_n_s16(int16x4_t vec1, int16_t val2); // VQDMULL.S16 q0,d0,d0[0] +_NEON2SSE_INLINE int32x4_t vqdmull_n_s16(int16x4_t vec1, int16_t val2) +{ + //the serial soulution may be faster due to saturation + int16x4_t b; + b = vdup_n_s16(val2); + return vqdmull_s16(vec1, b); +} + +int64x2_t vqdmull_n_s32(int32x2_t vec1, int32_t val2); // VQDMULL.S32 q0,d0,d0[0] +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x2_t vqdmull_n_s32(int32x2_t vec1, int32_t val2), _NEON2SSE_REASON_SLOW_SERIAL) +{ + int32x2_t b; + b = vdup_n_s32(val2); + return vqdmull_s32(vec1,b); //slow serial function!!!! +} + +//************* Vector saturating doubling long multiply by scalar *********************************************** +int32x4_t vqdmull_lane_s16(int16x4_t vec1, int16x4_t val2, __constrange(0, 3) int val3); // VQDMULL.S16 q0,d0,d0[0] +_NEON2SSE_INLINE int32x4_t vqdmull_lane_s16(int16x4_t vec1, int16x4_t val2, __constrange(0, 3) int val3) +{ + int16_t c; + int16x4_t scalar; + c = vget_lane_s16(val2, val3); + scalar = vdup_n_s16(c); + return vqdmull_s16(vec1, scalar); +} + + +int64x2_t vqdmull_lane_s32(int32x2_t vec1, int32x2_t val2, __constrange(0, 1) int val3); // VQDMULL.S32 q0,d0,d0[0] +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x2_t vqdmull_lane_s32(int32x2_t vec1, int32x2_t val2, __constrange(0, 1) int val3), _NEON2SSE_REASON_SLOW_SERIAL) +{ + int32_t c; + int32x2_t scalar; + c = vget_lane_s32(val2, val3); + scalar = vdup_n_s32(c); + return vqdmull_s32(vec1,scalar); //slow serial function!!!! +} + +// *****Vector saturating doubling multiply high with scalar ***** +int16x4_t vqdmulh_n_s16(int16x4_t vec1, int16_t val2); // VQDMULH.S16 d0,d0,d0[0] +_NEON2SSE_INLINE int16x4_t vqdmulh_n_s16(int16x4_t vec1, int16_t val2) +{ + int16x4_t res64; + return64(vqdmulhq_n_s16(_pM128i(vec1), val2)); +} + +int32x2_t vqdmulh_n_s32(int32x2_t vec1, int32_t val2); // VQDMULH.S32 d0,d0,d0[0] +_NEON2SSE_INLINE int32x2_t vqdmulh_n_s32(int32x2_t vec1, int32_t val2) +{ + int32x2_t res64; + return64(vqdmulhq_n_s32(_pM128i(vec1), val2)); +} + +int16x8_t vqdmulhq_n_s16(int16x8_t vec1, int16_t val2); // VQDMULH.S16 q0,q0,d0[0] +_NEON2SSE_INLINE int16x8_t vqdmulhq_n_s16(int16x8_t vec1, int16_t val2) // VQDMULH.S16 q0,q0,d0[0] +{ + //solution may be not optimal + int16x8_t scalar; + scalar = vdupq_n_s16(val2); + return vqdmulhq_s16(vec1, scalar); +} + +int32x4_t vqdmulhq_n_s32(int32x4_t vec1, int32_t val2); // VQDMULH.S32 q0,q0,d0[0] +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x4_t vqdmulhq_n_s32(int32x4_t vec1, int32_t val2), _NEON2SSE_REASON_SLOW_UNEFFECTIVE) +{ + int32x4_t scalar; + scalar = vdupq_n_s32(val2); + return vqdmulhq_s32(vec1, scalar); +} + +//***** Vector saturating doubling multiply high by scalar **************** +int16x4_t vqdmulh_lane_s16(int16x4_t vec1, int16x4_t val2, __constrange(0, 3) int val3); // VQDMULH.S16 d0,d0,d0[0] +_NEON2SSE_INLINE int16x4_t vqdmulh_lane_s16(int16x4_t vec1, int16x4_t val2, __constrange(0, 3) int val3) // VQDMULH.S16 d0,d0,d0[0] +{ + //solution may be not optimal + int16_t vlane; + int16x4_t scalar; + vlane = vget_lane_s16(val2, val3); + scalar = vdup_n_s16(vlane); + return vqdmulh_s16(vec1, scalar); +} + +int32x2_t vqdmulh_lane_s32(int32x2_t vec1, int32x2_t val2, __constrange(0, 1) int val3); // VQDMULH.S32 d0,d0,d0[0] +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x2_t vqdmulh_lane_s32(int32x2_t vec1, int32x2_t val2, __constrange(0, 1) int val3), _NEON2SSE_REASON_SLOW_UNEFFECTIVE) +{ + int32_t vlane; + int32x2_t scalar; + vlane = vget_lane_s32(val2, val3); + scalar = vdup_n_s32(vlane); + return vqdmulh_s32(vec1, scalar); +} + +int16x8_t vqdmulhq_lane_s16(int16x8_t vec1, int16x4_t val2, __constrange(0, 3) int val3); // VQDMULH.S16 q0,q0,d0[0] +_NEON2SSE_INLINE int16x8_t vqdmulhq_lane_s16(int16x8_t vec1, int16x4_t val2, __constrange(0, 3) int val3) // VQDMULH.S16 q0,q0,d0[0] +{ + //solution may be not optimal + int16_t vlane; + int16x8_t scalar; + vlane = vget_lane_s16(val2, val3); + scalar = vdupq_n_s16(vlane ); + return vqdmulhq_s16(vec1, scalar); +} + +int32x4_t vqdmulhq_lane_s32(int32x4_t vec1, int32x2_t val2, __constrange(0, 1) int val3); // VQDMULH.S32 q0,q0,d0[0] +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x4_t vqdmulhq_lane_s32(int32x4_t vec1, int32x2_t val2, __constrange(0, 1) int val3), _NEON2SSE_REASON_SLOW_UNEFFECTIVE) +{ + //solution may be not optimal + int32_t vlane; + int32x4_t scalar; + vlane = vgetq_lane_s32(_pM128i(val2), val3); + scalar = vdupq_n_s32(vlane ); + return vqdmulhq_s32(vec1, scalar); +} + +//******** Vector saturating rounding doubling multiply high with scalar *** +int16x4_t vqrdmulh_n_s16(int16x4_t vec1, int16_t val2); // VQRDMULH.S16 d0,d0,d0[0] +_NEON2SSE_INLINE int16x4_t vqrdmulh_n_s16(int16x4_t vec1, int16_t val2) // VQRDMULH.S16 d0,d0,d0[0] +{ + //solution may be not optimal + int16x4_t scalar; + scalar = vdup_n_s16(val2); + return vqrdmulh_s16(vec1, scalar); +} + +int32x2_t vqrdmulh_n_s32(int32x2_t vec1, int32_t val2); // VQRDMULH.S32 d0,d0,d0[0] +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x2_t vqrdmulh_n_s32(int32x2_t vec1, int32_t val2), _NEON2SSE_REASON_SLOW_UNEFFECTIVE) +{ + int32x2_t scalar; + scalar = vdup_n_s32(val2); + return vqrdmulh_s32(vec1, scalar); +} + +int16x8_t vqrdmulhq_n_s16(int16x8_t vec1, int16_t val2); // VQRDMULH.S16 q0,q0,d0[0] +_NEON2SSE_INLINE int16x8_t vqrdmulhq_n_s16(int16x8_t vec1, int16_t val2) // VQRDMULH.S16 q0,q0,d0[0] +{ + //solution may be not optimal + int16x8_t scalar; + scalar = vdupq_n_s16(val2); + return vqrdmulhq_s16(vec1, scalar); +} + +int32x4_t vqrdmulhq_n_s32(int32x4_t vec1, int32_t val2); // VQRDMULH.S32 q0,q0,d0[0] +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x4_t vqrdmulhq_n_s32(int32x4_t vec1, int32_t val2), _NEON2SSE_REASON_SLOW_UNEFFECTIVE) +{ + int32x4_t scalar; + scalar = vdupq_n_s32(val2); + return vqrdmulhq_s32(vec1, scalar); +} + +//********* Vector rounding saturating doubling multiply high by scalar **** +int16x4_t vqrdmulh_lane_s16(int16x4_t vec1, int16x4_t val2, __constrange(0, 3) int val3); // VQRDMULH.S16 d0,d0,d0[0] +_NEON2SSE_INLINE int16x4_t vqrdmulh_lane_s16(int16x4_t vec1, int16x4_t val2, __constrange(0, 3) int val3) // VQRDMULH.S16 d0,d0,d0[0] +{ + //solution may be not optimal + int16_t vlane; + int16x4_t scalar; + vlane = vget_lane_s16(val2, val3); + scalar = vdup_n_s16(vlane); + return vqrdmulh_s16(vec1, scalar); +} + +int32x2_t vqrdmulh_lane_s32(int32x2_t vec1, int32x2_t val2, __constrange(0, 1) int val3); // VQRDMULH.S32 d0,d0,d0[0] +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x2_t vqrdmulh_lane_s32(int32x2_t vec1, int32x2_t val2, __constrange(0, 1) int val3), _NEON2SSE_REASON_SLOW_UNEFFECTIVE) +{ + int32_t vlane; + int32x2_t scalar; + vlane = vget_lane_s32(val2, val3); + scalar = vdup_n_s32(vlane); + return vqrdmulh_s32(vec1, scalar); +} + +int16x8_t vqrdmulhq_lane_s16(int16x8_t vec1, int16x4_t val2, __constrange(0, 3) int val3); // VQRDMULH.S16 q0,q0,d0[0] +_NEON2SSE_INLINE int16x8_t vqrdmulhq_lane_s16(int16x8_t vec1, int16x4_t val2, __constrange(0, 3) int val3) // VQRDMULH.S16 q0,q0,d0[0] +{ + //solution may be not optimal + int16_t vlane; + int16x8_t scalar; + vlane = vget_lane_s16(val2, val3); + scalar = vdupq_n_s16(vlane); + return vqrdmulhq_s16(vec1, scalar); +} + +int32x4_t vqrdmulhq_lane_s32(int32x4_t vec1, int32x2_t val2, __constrange(0, 1) int val3); // VQRDMULH.S32 q0,q0,d0[0] +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int32x4_t vqrdmulhq_lane_s32(int32x4_t vec1, int32x2_t val2, __constrange(0, 1) int val3), _NEON2SSE_REASON_SLOW_UNEFFECTIVE) +{ + //solution may be not optimal + int32_t vlane; + int32x4_t scalar; + vlane = vgetq_lane_s32(_pM128i(val2), val3); + scalar = vdupq_n_s32(vlane ); + return vqrdmulhq_s32(vec1, scalar); +} + +//**************Vector multiply accumulate with scalar ******************* +int16x4_t vmla_n_s16(int16x4_t a, int16x4_t b, int16_t c); // VMLA.I16 d0, d0, d0[0] +_NEON2SSE_INLINE int16x4_t vmla_n_s16(int16x4_t a, int16x4_t b, int16_t c) // VMLA.I16 d0, d0, d0[0] +{ + int16x4_t scalar; + scalar = vdup_n_s16(c); + return vmla_s16(a, b, scalar); +} + +int32x2_t vmla_n_s32(int32x2_t a, int32x2_t b, int32_t c); // VMLA.I32 d0, d0, d0[0] +_NEON2SSE_INLINE int32x2_t vmla_n_s32(int32x2_t a, int32x2_t b, int32_t c) // VMLA.I32 d0, d0, d0[0] +{ + int32x2_t scalar; + scalar = vdup_n_s32(c); + return vmla_s32(a, b, scalar); +} + +uint16x4_t vmla_n_u16(uint16x4_t a, uint16x4_t b, uint16_t c); // VMLA.I16 d0, d0, d0[0] +#define vmla_n_u16 vmla_n_s16 + + +uint32x2_t vmla_n_u32(uint32x2_t a, uint32x2_t b, uint32_t c); // VMLA.I32 d0, d0, d0[0] +#define vmla_n_u32 vmla_n_s32 + + +float32x2_t vmla_n_f32(float32x2_t a, float32x2_t b, float32_t c); // VMLA.F32 d0, d0, d0[0] +_NEON2SSE_INLINE float32x2_t vmla_n_f32(float32x2_t a, float32x2_t b, float32_t c) // VMLA.F32 d0, d0, d0[0] +{ + float32x2_t scalar; + scalar = vdup_n_f32(c); + return vmla_f32(a, b, scalar); +} + +int16x8_t vmlaq_n_s16(int16x8_t a, int16x8_t b, int16_t c); // VMLA.I16 q0, q0, d0[0] +_NEON2SSE_INLINE int16x8_t vmlaq_n_s16(int16x8_t a, int16x8_t b, int16_t c) // VMLA.I16 q0, q0, d0[0] +{ + int16x8_t scalar; + scalar = vdupq_n_s16(c); + return vmlaq_s16(a,b,scalar); +} + +int32x4_t vmlaq_n_s32(int32x4_t a, int32x4_t b, int32_t c); // VMLA.I32 q0, q0, d0[0] +_NEON2SSE_INLINE int32x4_t vmlaq_n_s32(int32x4_t a, int32x4_t b, int32_t c) // VMLA.I32 q0, q0, d0[0] +{ + int32x4_t scalar; + scalar = vdupq_n_s32(c); + return vmlaq_s32(a,b,scalar); +} + +uint16x8_t vmlaq_n_u16(uint16x8_t a, uint16x8_t b, uint16_t c); // VMLA.I16 q0, q0, d0[0] +#define vmlaq_n_u16 vmlaq_n_s16 + +uint32x4_t vmlaq_n_u32(uint32x4_t a, uint32x4_t b, uint32_t c); // VMLA.I32 q0, q0, d0[0] +#define vmlaq_n_u32 vmlaq_n_s32 + +float32x4_t vmlaq_n_f32(float32x4_t a, float32x4_t b, float32_t c); // VMLA.F32 q0, q0, d0[0] +_NEON2SSE_INLINE float32x4_t vmlaq_n_f32(float32x4_t a, float32x4_t b, float32_t c) // VMLA.F32 q0, q0, d0[0] +{ + float32x4_t scalar; + scalar = vdupq_n_f32(c); + return vmlaq_f32(a,b,scalar); +} + +//************Vector widening multiply accumulate with scalar**************************** +int32x4_t vmlal_n_s16(int32x4_t a, int16x4_t b, int16_t c); // VMLAL.S16 q0, d0, d0[0] +_NEON2SSE_INLINE int32x4_t vmlal_n_s16(int32x4_t a, int16x4_t b, int16_t c) // VMLAL.S16 q0, d0, d0[0] +{ + int16x4_t vc; + vc = vdup_n_s16(c); + return vmlal_s16(a, b, vc); +} + +int64x2_t vmlal_n_s32(int64x2_t a, int32x2_t b, int32_t c); // VMLAL.S32 q0, d0, d0[0] +_NEON2SSE_INLINE int64x2_t vmlal_n_s32(int64x2_t a, int32x2_t b, int32_t c) // VMLAL.S32 q0, d0, d0[0] +{ + int32x2_t vc; + vc = vdup_n_s32(c); + return vmlal_s32(a, b, vc); +} + +uint32x4_t vmlal_n_u16(uint32x4_t a, uint16x4_t b, uint16_t c); // VMLAL.s16 q0, d0, d0[0] +_NEON2SSE_INLINE uint32x4_t vmlal_n_u16(uint32x4_t a, uint16x4_t b, uint16_t c) // VMLAL.s16 q0, d0, d0[0] +{ + uint16x4_t vc; + vc = vdup_n_s16(c); + return vmlal_s16(a, b, vc); +} + +uint64x2_t vmlal_n_u32(uint64x2_t a, uint32x2_t b, uint32_t c); // VMLAL.U32 q0, d0, d0[0] +_NEON2SSE_INLINE uint64x2_t vmlal_n_u32(uint64x2_t a, uint32x2_t b, uint32_t c) // VMLAL.U32 q0, d0, d0[0] +{ + uint32x2_t vc; + vc = vdup_n_u32(c); + return vmlal_u32(a, b, vc); +} + +//************ Vector widening saturating doubling multiply accumulate with scalar ************** +int32x4_t vqdmlal_n_s16(int32x4_t a, int16x4_t b, int16_t c); // VQDMLAL.S16 q0, d0, d0[0] +_NEON2SSE_INLINE int32x4_t vqdmlal_n_s16(int32x4_t a, int16x4_t b, int16_t c) +{ + //not optimal SIMD soulution, serial may be faster + int16x4_t vc; + vc = vdup_n_s16(c); + return vqdmlal_s16(a, b, vc); +} + +int64x2_t vqdmlal_n_s32(int64x2_t a, int32x2_t b, int32_t c); // VQDMLAL.S32 q0, d0, d0[0] +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x2_t vqdmlal_n_s32(int64x2_t a, int32x2_t b, int32_t c), _NEON2SSE_REASON_SLOW_SERIAL) +{ + int32x2_t vc; + vc = vdup_n_s32(c); + return vqdmlal_s32(a, b, vc); +} + +//******** Vector multiply subtract with scalar ************** +int16x4_t vmls_n_s16(int16x4_t a, int16x4_t b, int16_t c); // VMLS.I16 d0, d0, d0[0] +_NEON2SSE_INLINE int16x4_t vmls_n_s16(int16x4_t a, int16x4_t b, int16_t c) // VMLS.I16 d0, d0, d0[0] +{ + int16x4_t vc; + vc = vdup_n_s16(c); + return vmls_s16(a, b, vc); +} + +int32x2_t vmls_n_s32(int32x2_t a, int32x2_t b, int32_t c); // VMLS.I32 d0, d0, d0[0] +_NEON2SSE_INLINE int32x2_t vmls_n_s32(int32x2_t a, int32x2_t b, int32_t c) // VMLS.I32 d0, d0, d0[0] +{ + int32x2_t vc; + vc = vdup_n_s32(c); + return vmls_s32(a, b, vc); +} + +uint16x4_t vmls_n_u16(uint16x4_t a, uint16x4_t b, uint16_t c); // VMLS.I16 d0, d0, d0[0] +_NEON2SSE_INLINE uint16x4_t vmls_n_u16(uint16x4_t a, uint16x4_t b, uint16_t c) // VMLS.I16 d0, d0, d0[0] +{ + uint16x4_t vc; + vc = vdup_n_s16(c); + return vmls_s16(a, b, vc); +} + +uint32x2_t vmls_n_u32(uint32x2_t a, uint32x2_t b, uint32_t c); // VMLS.I32 d0, d0, d0[0] +_NEON2SSE_INLINE uint32x2_t vmls_n_u32(uint32x2_t a, uint32x2_t b, uint32_t c) // VMLS.I32 d0, d0, d0[0] +{ + uint32x2_t vc; + vc = vdup_n_u32(c); + return vmls_u32(a, b, vc); +} + +float32x2_t vmls_n_f32(float32x2_t a, float32x2_t b, float32_t c); // VMLS.F32 d0, d0, d0[0] +_NEON2SSE_INLINE float32x2_t vmls_n_f32(float32x2_t a, float32x2_t b, float32_t c) +{ + float32x2_t res; + res.m64_f32[0] = a.m64_f32[0] - b.m64_f32[0] * c; + res.m64_f32[1] = a.m64_f32[1] - b.m64_f32[1] * c; + return res; +} + +int16x8_t vmlsq_n_s16(int16x8_t a, int16x8_t b, int16_t c); // VMLS.I16 q0, q0, d0[0] +_NEON2SSE_INLINE int16x8_t vmlsq_n_s16(int16x8_t a, int16x8_t b, int16_t c) // VMLS.I16 q0, q0, d0[0] +{ + int16x8_t vc; + vc = vdupq_n_s16(c); + return vmlsq_s16(a, b,vc); +} + +int32x4_t vmlsq_n_s32(int32x4_t a, int32x4_t b, int32_t c); // VMLS.I32 q0, q0, d0[0] +_NEON2SSE_INLINE int32x4_t vmlsq_n_s32(int32x4_t a, int32x4_t b, int32_t c) // VMLS.I32 q0, q0, d0[0] +{ + int32x4_t vc; + vc = vdupq_n_s32(c); + return vmlsq_s32(a,b,vc); +} + +uint16x8_t vmlsq_n_u16(uint16x8_t a, uint16x8_t b, uint16_t c); // VMLS.I16 q0, q0, d0[0] +_NEON2SSE_INLINE uint16x8_t vmlsq_n_u16(uint16x8_t a, uint16x8_t b, uint16_t c) // VMLS.I16 q0, q0, d0[0] +{ + uint32x4_t vc; + vc = vdupq_n_u32(c); + return vmlsq_u32(a,b,vc); +} + +uint32x4_t vmlsq_n_u32(uint32x4_t a, uint32x4_t b, uint32_t c); // VMLS.I32 q0, q0, d0[0] +_NEON2SSE_INLINE uint32x4_t vmlsq_n_u32(uint32x4_t a, uint32x4_t b, uint32_t c) // VMLS.I32 q0, q0, d0[0] +{ + uint32x4_t vc; + vc = vdupq_n_u32(c); + return vmlsq_u32(a,b,vc); +} + +float32x4_t vmlsq_n_f32(float32x4_t a, float32x4_t b, float32_t c); // VMLS.F32 q0, q0, d0[0] +_NEON2SSE_INLINE float32x4_t vmlsq_n_f32(float32x4_t a, float32x4_t b, float32_t c) +{ + float32x4_t vc; + vc = vdupq_n_f32(c); + return vmlsq_f32(a,b,vc); +} + +//**** Vector widening multiply subtract with scalar ****** +int32x4_t vmlsl_n_s16(int32x4_t a, int16x4_t b, int16_t c); // VMLSL.S16 q0, d0, d0[0] +_NEON2SSE_INLINE int32x4_t vmlsl_n_s16(int32x4_t a, int16x4_t b, int16_t c) // VMLSL.S16 q0, d0, d0[0] +{ + int16x4_t vc; + vc = vdup_n_s16(c); + return vmlsl_s16(a, b, vc); +} + +int64x2_t vmlsl_n_s32(int64x2_t a, int32x2_t b, int32_t c); // VMLSL.S32 q0, d0, d0[0] +_NEON2SSE_INLINE int64x2_t vmlsl_n_s32(int64x2_t a, int32x2_t b, int32_t c) // VMLSL.S32 q0, d0, d0[0] +{ + int32x2_t vc; + vc = vdup_n_s32(c); + return vmlsl_s32(a, b, vc); +} + +uint32x4_t vmlsl_n_u16(uint32x4_t a, uint16x4_t b, uint16_t c); // VMLSL.s16 q0, d0, d0[0] +_NEON2SSE_INLINE uint32x4_t vmlsl_n_u16(uint32x4_t a, uint16x4_t b, uint16_t c) // VMLSL.s16 q0, d0, d0[0] +{ + uint16x4_t vc; + vc = vdup_n_u16(c); + return vmlsl_u16(a, b, vc); +} + +uint64x2_t vmlsl_n_u32(uint64x2_t a, uint32x2_t b, uint32_t c); // VMLSL.U32 q0, d0, d0[0] +_NEON2SSE_INLINE uint64x2_t vmlsl_n_u32(uint64x2_t a, uint32x2_t b, uint32_t c) // VMLSL.U32 q0, d0, d0[0] +{ + uint32x2_t vc; + vc = vdup_n_u32(c); + return vmlsl_u32(a, b, vc); +} + +//***** Vector widening saturating doubling multiply subtract with scalar ********* +//********************************************************************************** +int32x4_t vqdmlsl_n_s16(int32x4_t a, int16x4_t b, int16_t c); // VQDMLSL.S16 q0, d0, d0[0] +_NEON2SSE_INLINE int32x4_t vqdmlsl_n_s16(int32x4_t a, int16x4_t b, int16_t c) +{ + int16x4_t vc; + vc = vdup_n_s16(c); + return vqdmlsl_s16(a, b, vc); +} + +int64x2_t vqdmlsl_n_s32(int64x2_t a, int32x2_t b, int32_t c); // VQDMLSL.S32 q0, d0, d0[0] +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int64x2_t vqdmlsl_n_s32(int64x2_t a, int32x2_t b, int32_t c), _NEON2SSE_REASON_SLOW_SERIAL) +{ + int32x2_t vc; + vc = vdup_n_s32(c); + return vqdmlsl_s32(a, b, vc); +} + +//******************* Vector extract *********************************************** +//************************************************************************************* +//VEXT (Vector Extract) extracts elements from the bottom end of the second operand +//vector and the top end of the first, concatenates them, and places the result in the destination vector +//c elements from the bottom end of the second operand and (8-c) from the top end of the first +int8x8_t vext_s8(int8x8_t a, int8x8_t b, __constrange(0,7) int c); // VEXT.8 d0,d0,d0,#0 +_NEON2SSE_INLINE _NEON2SSE_PERFORMANCE_WARNING(int8x8_t vext_s8(int8x8_t a, int8x8_t b, __constrange(0,7) int c),_NEON2SSE_REASON_SLOW_SERIAL) +{ + int8x8_t res; + int i; + for (i = 0; i<8 - c; i++) { + res.m64_i8[i] = a.m64_i8[i + c]; + } + for(i = 0; i> 1); + tmp = _mm_srli_epi32(res, 2); + res = _mm_or_si128(tmp, res); //atmp[i] |= (atmp[i] >> 2); + tmp = _mm_srli_epi32(res, 4); + res = _mm_or_si128(tmp, res); //atmp[i] |= (atmp[i] >> 4); + tmp = _mm_srli_epi32(res, 8); + res = _mm_or_si128(tmp, res); //atmp[i] |= (atmp[i] >> 8); + tmp = _mm_srli_epi32(res, 16); + res = _mm_or_si128(tmp, res); //atmp[i] |= (atmp[i] >> 16); + + tmp = _mm_srli_epi32(res, 1); + tmp = _mm_and_si128(tmp, c55555555); + res = _mm_sub_epi32(res, tmp); //atmp[i] -= ((atmp[i] >> 1) & 0x55555555); + + tmp = _mm_srli_epi32(res, 2); + tmp = _mm_and_si128(tmp, c33333333); + tmp1 = _mm_and_si128(res, c33333333); + res = _mm_add_epi32(tmp, tmp1); //atmp[i] = (((atmp[i] >> 2) & 0x33333333) + (atmp[i] & 0x33333333)); + + tmp = _mm_srli_epi32(res, 4); + tmp = _mm_add_epi32(tmp, res); + res = _mm_and_si128(tmp, c0f0f0f0f); //atmp[i] = (((atmp[i] >> 4) + atmp[i]) & 0x0f0f0f0f); + + tmp = _mm_srli_epi32(res, 8); + res = _mm_add_epi32(tmp, res); //atmp[i] += (atmp[i] >> 8); + + tmp = _mm_srli_epi32(res, 16); + res = _mm_add_epi32(tmp, res); //atmp[i] += (atmp[i] >> 16); + + res = _mm_and_si128(res, c3f); //atmp[i] = atmp[i] & 0x0000003f; + + return _mm_sub_epi32(c32, res); //res[i] = 32 - atmp[i]; +} + +uint8x16_t vclzq_u8(uint8x16_t a); // VCLZ.I8 q0,q0 +#define vclzq_u8 vclzq_s8 + +uint16x8_t vclzq_u16(uint16x8_t a); // VCLZ.I16 q0,q0 +#define vclzq_u16 vclzq_s16 + +uint32x4_t vclzq_u32(uint32x4_t a); // VCLZ.I32 q0,q0 +#define vclzq_u32 vclzq_s32 + +//************** Count leading sign bits ************************** +//******************************************************************** +//VCLS (Vector Count Leading Sign bits) counts the number of consecutive bits following +// the topmost bit, that are the same as the topmost bit, in each element in a vector +//No corresponding vector intrinsics in IA32, need to implement it. +//While the implementation is effective for 8 bits, it may be not for 16 and 32 bits +int8x8_t vcls_s8(int8x8_t a); // VCLS.S8 d0,d0 +_NEON2SSE_INLINE int8x8_t vcls_s8(int8x8_t a) +{ + int8x8_t res64; + __m128i res; + res = vclsq_s8(_pM128i(a)); + return64(res); +} + +int16x4_t vcls_s16(int16x4_t a); // VCLS.S16 d0,d0 +_NEON2SSE_INLINE int16x4_t vcls_s16(int16x4_t a) +{ + int16x4_t res64; + __m128i res; + res = vclsq_s16(_pM128i(a)); + return64(res); +} + +int32x2_t vcls_s32(int32x2_t a); // VCLS.S32 d0,d0 +_NEON2SSE_INLINE int32x2_t vcls_s32(int32x2_t a) +{ + int32x2_t res64; + __m128i res; + res = vclsq_s32(_pM128i(a)); + return64(res); +} + +int8x16_t vclsq_s8(int8x16_t a); // VCLS.S8 q0,q0 +_NEON2SSE_INLINE int8x16_t vclsq_s8(int8x16_t a) +{ + __m128i cff, c80, c1, a_mask, a_neg, a_pos, a_comb; + cff = _mm_cmpeq_epi8 (a,a); //0xff + c80 = _mm_set1_epi8(0x80); + c1 = _mm_set1_epi8(1); + a_mask = _mm_and_si128(a, c80); + a_mask = _mm_cmpeq_epi8(a_mask, c80); //0xff if negative input and 0 if positive + a_neg = _mm_xor_si128(a, cff); + a_neg = _mm_and_si128(a_mask, a_neg); + a_pos = _mm_andnot_si128(a_mask, a); + a_comb = _mm_or_si128(a_pos, a_neg); + a_comb = vclzq_s8(a_comb); + return _mm_sub_epi8(a_comb, c1); +} + +int16x8_t vclsq_s16(int16x8_t a); // VCLS.S16 q0,q0 +_NEON2SSE_INLINE int16x8_t vclsq_s16(int16x8_t a) +{ + __m128i cffff, c8000, c1, a_mask, a_neg, a_pos, a_comb; + cffff = _mm_cmpeq_epi16(a,a); + c8000 = _mm_slli_epi16(cffff, 15); //0x8000 + c1 = _mm_srli_epi16(cffff,15); //0x1 + a_mask = _mm_and_si128(a, c8000); + a_mask = _mm_cmpeq_epi16(a_mask, c8000); //0xffff if negative input and 0 if positive + a_neg = _mm_xor_si128(a, cffff); + a_neg = _mm_and_si128(a_mask, a_neg); + a_pos = _mm_andnot_si128(a_mask, a); + a_comb = _mm_or_si128(a_pos, a_neg); + a_comb = vclzq_s16(a_comb); + return _mm_sub_epi16(a_comb, c1); +} + +int32x4_t vclsq_s32(int32x4_t a); // VCLS.S32 q0,q0 +_NEON2SSE_INLINE int32x4_t vclsq_s32(int32x4_t a) +{ + __m128i cffffffff, c80000000, c1, a_mask, a_neg, a_pos, a_comb; + cffffffff = _mm_cmpeq_epi32(a,a); + c80000000 = _mm_slli_epi32(cffffffff, 31); //0x80000000 + c1 = _mm_srli_epi32(cffffffff,31); //0x1 + a_mask = _mm_and_si128(a, c80000000); + a_mask = _mm_cmpeq_epi32(a_mask, c80000000); //0xffffffff if negative input and 0 if positive + a_neg = _mm_xor_si128(a, cffffffff); + a_neg = _mm_and_si128(a_mask, a_neg); + a_pos = _mm_andnot_si128(a_mask, a); + a_comb = _mm_or_si128(a_pos, a_neg); + a_comb = vclzq_s32(a_comb); + return _mm_sub_epi32(a_comb, c1); +} + +//************************* Count number of set bits ******************************** +//************************************************************************************* +//No corresponding SIMD solution. One option is to get a elements, convert it to 32 bits and then use SSE4.2 _mm_popcnt__u32 (unsigned int v) for each element +//another option is to do the following algorithm: + +uint8x8_t vcnt_u8(uint8x8_t a); // VCNT.8 d0,d0 +_NEON2SSE_INLINE uint8x8_t vcnt_u8(uint8x8_t a) +{ + uint8x8_t res64; + __m128i res; + res = vcntq_u8(_pM128i(a)); + return64(res); +} + +int8x8_t vcnt_s8(int8x8_t a); // VCNT.8 d0,d0 +#define vcnt_s8 vcnt_u8 + +poly8x8_t vcnt_p8(poly8x8_t a); // VCNT.8 d0,d0 +#define vcnt_p8 vcnt_u8 + +uint8x16_t vcntq_u8(uint8x16_t a); // VCNT.8 q0,q0 +_NEON2SSE_INLINE uint8x16_t vcntq_u8(uint8x16_t a) +{ + _NEON2SSE_ALIGN_16 int8_t mask_POPCOUNT[16] = { /* 0 */ 0,/* 1 */ 1,/* 2 */ 1,/* 3 */ 2, + /* 4 */ 1,/* 5 */ 2,/* 6 */ 2,/* 7 */ 3, + /* 8 */ 1,/* 9 */ 2,/* a */ 2,/* b */ 3, + /* c */ 2,/* d */ 3,/* e */ 3,/* f */ 4 }; + __m128i maskLOW, mask, lowpopcnt, hipopcnt; + maskLOW = _mm_set1_epi8(0x0f); //low 4 bits, need masking to avoid zero if MSB is set + mask = _mm_and_si128(a, maskLOW); + lowpopcnt = _mm_shuffle_epi8( *(__m128i*)mask_POPCOUNT, mask); //uses low 4 bits anyway + mask = _mm_srli_epi16(a, 4); //get high 4 bits as low bits + mask = _mm_and_si128(mask, maskLOW); //low 4 bits, need masking to avoid zero if MSB is set + hipopcnt = _mm_shuffle_epi8( *(__m128i*) mask_POPCOUNT, mask); //uses low 4 bits anyway + return _mm_add_epi8(lowpopcnt, hipopcnt); +} + +int8x16_t vcntq_s8(int8x16_t a); // VCNT.8 q0,q0 +#define vcntq_s8 vcntq_u8 + +poly8x16_t vcntq_p8(poly8x16_t a); // VCNT.8 q0,q0 +#define vcntq_p8 vcntq_u8 + +//************************************************************************************** +//*********************** Logical operations **************************************** +//************************************************************************************** +//************************** Bitwise not *********************************** +//several Bitwise not implementations possible for SIMD. Eg "xor" with all ones, but the following one gives good performance +int8x8_t vmvn_s8(int8x8_t a); // VMVN d0,d0 +_NEON2SSE_INLINE int8x8_t vmvn_s8(int8x8_t a) +{ + int8x8_t res64; + __m128i res; + res = vmvnq_s8(_pM128i(a)); + return64(res); +} + +int16x4_t vmvn_s16(int16x4_t a); // VMVN d0,d0 +_NEON2SSE_INLINE int16x4_t vmvn_s16(int16x4_t a) +{ + int16x4_t res64; + __m128i res; + res = vmvnq_s16(_pM128i(a)); + return64(res); +} + +int32x2_t vmvn_s32(int32x2_t a); // VMVN d0,d0 +_NEON2SSE_INLINE int32x2_t vmvn_s32(int32x2_t a) +{ + int32x2_t res64; + __m128i res; + res = vmvnq_s32(_pM128i(a)); + return64(res); +} + +uint8x8_t vmvn_u8(uint8x8_t a); // VMVN d0,d0 +#define vmvn_u8 vmvn_s8 + +uint16x4_t vmvn_u16(uint16x4_t a); // VMVN d0,d0 +#define vmvn_u16 vmvn_s16 + +uint32x2_t vmvn_u32(uint32x2_t a); // VMVN d0,d0 +#define vmvn_u32 vmvn_s32 + +poly8x8_t vmvn_p8(poly8x8_t a); // VMVN d0,d0 +#define vmvn_p8 vmvn_u8 + +int8x16_t vmvnq_s8(int8x16_t a); // VMVN q0,q0 +_NEON2SSE_INLINE int8x16_t vmvnq_s8(int8x16_t a) // VMVN q0,q0 +{ + __m128i c1; + c1 = _mm_cmpeq_epi8 (a,a); //0xff + return _mm_andnot_si128 (a, c1); +} + +int16x8_t vmvnq_s16(int16x8_t a); // VMVN q0,q0 +_NEON2SSE_INLINE int16x8_t vmvnq_s16(int16x8_t a) // VMVN q0,q0 +{ + __m128i c1; + c1 = _mm_cmpeq_epi16 (a,a); //0xffff + return _mm_andnot_si128 (a, c1); +} + +int32x4_t vmvnq_s32(int32x4_t a); // VMVN q0,q0 +_NEON2SSE_INLINE int32x4_t vmvnq_s32(int32x4_t a) // VMVN q0,q0 +{ + __m128i c1; + c1 = _mm_cmpeq_epi32 (a,a); //0xffffffff + return _mm_andnot_si128 (a, c1); +} + +uint8x16_t vmvnq_u8(uint8x16_t a); // VMVN q0,q0 +#define vmvnq_u8 vmvnq_s8 + +uint16x8_t vmvnq_u16(uint16x8_t a); // VMVN q0,q0 +#define vmvnq_u16 vmvnq_s16 + +uint32x4_t vmvnq_u32(uint32x4_t a); // VMVN q0,q0 +#define vmvnq_u32 vmvnq_s32 + +poly8x16_t vmvnq_p8(poly8x16_t a); // VMVN q0,q0 +#define vmvnq_p8 vmvnq_u8 + +//****************** Bitwise and *********************** +//****************************************************** +int8x8_t vand_s8(int8x8_t a, int8x8_t b); // VAND d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vand_s8(int8x8_t a, int8x8_t b) +{ + int8x8_t res64; + return64(_mm_and_si128(_pM128i(a),_pM128i(b))); +} + +int16x4_t vand_s16(int16x4_t a, int16x4_t b); // VAND d0,d0,d0 +_NEON2SSE_INLINE int16x4_t vand_s16(int16x4_t a, int16x4_t b) +{ + int16x4_t res64; + return64(_mm_and_si128(_pM128i(a),_pM128i(b))); +} + +int32x2_t vand_s32(int32x2_t a, int32x2_t b); // VAND d0,d0,d0 +_NEON2SSE_INLINE int32x2_t vand_s32(int32x2_t a, int32x2_t b) +{ + int32x2_t res64; + return64(_mm_and_si128(_pM128i(a),_pM128i(b))); +} + + +int64x1_t vand_s64(int64x1_t a, int64x1_t b); // VAND d0,d0,d0 +_NEON2SSE_INLINE int64x1_t vand_s64(int64x1_t a, int64x1_t b) +{ + int64x1_t res; + res.m64_i64[0] = a.m64_i64[0] & b.m64_i64[0]; + return res; +} + +uint8x8_t vand_u8(uint8x8_t a, uint8x8_t b); // VAND d0,d0,d0 +#define vand_u8 vand_s8 + +uint16x4_t vand_u16(uint16x4_t a, uint16x4_t b); // VAND d0,d0,d0 +#define vand_u16 vand_s16 + +uint32x2_t vand_u32(uint32x2_t a, uint32x2_t b); // VAND d0,d0,d0 +#define vand_u32 vand_s32 + +uint64x1_t vand_u64(uint64x1_t a, uint64x1_t b); // VAND d0,d0,d0 +#define vand_u64 vand_s64 + + +int8x16_t vandq_s8(int8x16_t a, int8x16_t b); // VAND q0,q0,q0 +#define vandq_s8 _mm_and_si128 + +int16x8_t vandq_s16(int16x8_t a, int16x8_t b); // VAND q0,q0,q0 +#define vandq_s16 _mm_and_si128 + +int32x4_t vandq_s32(int32x4_t a, int32x4_t b); // VAND q0,q0,q0 +#define vandq_s32 _mm_and_si128 + +int64x2_t vandq_s64(int64x2_t a, int64x2_t b); // VAND q0,q0,q0 +#define vandq_s64 _mm_and_si128 + +uint8x16_t vandq_u8(uint8x16_t a, uint8x16_t b); // VAND q0,q0,q0 +#define vandq_u8 _mm_and_si128 + +uint16x8_t vandq_u16(uint16x8_t a, uint16x8_t b); // VAND q0,q0,q0 +#define vandq_u16 _mm_and_si128 + +uint32x4_t vandq_u32(uint32x4_t a, uint32x4_t b); // VAND q0,q0,q0 +#define vandq_u32 _mm_and_si128 + +uint64x2_t vandq_u64(uint64x2_t a, uint64x2_t b); // VAND q0,q0,q0 +#define vandq_u64 _mm_and_si128 + +//******************** Bitwise or ********************************* +//****************************************************************** +int8x8_t vorr_s8(int8x8_t a, int8x8_t b); // VORR d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vorr_s8(int8x8_t a, int8x8_t b) +{ + int8x8_t res64; + return64(_mm_or_si128(_pM128i(a),_pM128i(b))); +} + + +int16x4_t vorr_s16(int16x4_t a, int16x4_t b); // VORR d0,d0,d0 +_NEON2SSE_INLINE int16x4_t vorr_s16(int16x4_t a, int16x4_t b) +{ + int16x4_t res64; + return64(_mm_or_si128(_pM128i(a),_pM128i(b))); +} + + +int32x2_t vorr_s32(int32x2_t a, int32x2_t b); // VORR d0,d0,d0 +_NEON2SSE_INLINE int32x2_t vorr_s32(int32x2_t a, int32x2_t b) +{ + int32x2_t res64; + return64(_mm_or_si128(_pM128i(a),_pM128i(b))); +} + + +int64x1_t vorr_s64(int64x1_t a, int64x1_t b); // VORR d0,d0,d0 +_NEON2SSE_INLINE int64x1_t vorr_s64(int64x1_t a, int64x1_t b) +{ + int64x1_t res; + res.m64_i64[0] = a.m64_i64[0] | b.m64_i64[0]; + return res; +} + +uint8x8_t vorr_u8(uint8x8_t a, uint8x8_t b); // VORR d0,d0,d0 +#define vorr_u8 vorr_s8 + +uint16x4_t vorr_u16(uint16x4_t a, uint16x4_t b); // VORR d0,d0,d0 +#define vorr_u16 vorr_s16 + +uint32x2_t vorr_u32(uint32x2_t a, uint32x2_t b); // VORR d0,d0,d0 +#define vorr_u32 vorr_s32 + +uint64x1_t vorr_u64(uint64x1_t a, uint64x1_t b); // VORR d0,d0,d0 +#define vorr_u64 vorr_s64 + +int8x16_t vorrq_s8(int8x16_t a, int8x16_t b); // VORR q0,q0,q0 +#define vorrq_s8 _mm_or_si128 + +int16x8_t vorrq_s16(int16x8_t a, int16x8_t b); // VORR q0,q0,q0 +#define vorrq_s16 _mm_or_si128 + +int32x4_t vorrq_s32(int32x4_t a, int32x4_t b); // VORR q0,q0,q0 +#define vorrq_s32 _mm_or_si128 + +int64x2_t vorrq_s64(int64x2_t a, int64x2_t b); // VORR q0,q0,q0 +#define vorrq_s64 _mm_or_si128 + +uint8x16_t vorrq_u8(uint8x16_t a, uint8x16_t b); // VORR q0,q0,q0 +#define vorrq_u8 _mm_or_si128 + +uint16x8_t vorrq_u16(uint16x8_t a, uint16x8_t b); // VORR q0,q0,q0 +#define vorrq_u16 _mm_or_si128 + +uint32x4_t vorrq_u32(uint32x4_t a, uint32x4_t b); // VORR q0,q0,q0 +#define vorrq_u32 _mm_or_si128 + +uint64x2_t vorrq_u64(uint64x2_t a, uint64x2_t b); // VORR q0,q0,q0 +#define vorrq_u64 _mm_or_si128 + +//************* Bitwise exclusive or (EOR or XOR) ****************** +//******************************************************************* +int8x8_t veor_s8(int8x8_t a, int8x8_t b); // VEOR d0,d0,d0 +_NEON2SSE_INLINE int8x8_t veor_s8(int8x8_t a, int8x8_t b) +{ + int8x8_t res64; + return64(_mm_xor_si128(_pM128i(a),_pM128i(b))); +} + +int16x4_t veor_s16(int16x4_t a, int16x4_t b); // VEOR d0,d0,d0 +#define veor_s16 veor_s8 + +int32x2_t veor_s32(int32x2_t a, int32x2_t b); // VEOR d0,d0,d0 +#define veor_s32 veor_s8 + +int64x1_t veor_s64(int64x1_t a, int64x1_t b); // VEOR d0,d0,d0 +_NEON2SSE_INLINE int64x1_t veor_s64(int64x1_t a, int64x1_t b) +{ + int64x1_t res; + res.m64_i64[0] = a.m64_i64[0] ^ b.m64_i64[0]; + return res; +} + +uint8x8_t veor_u8(uint8x8_t a, uint8x8_t b); // VEOR d0,d0,d0 +#define veor_u8 veor_s8 + +uint16x4_t veor_u16(uint16x4_t a, uint16x4_t b); // VEOR d0,d0,d0 +#define veor_u16 veor_s16 + +uint32x2_t veor_u32(uint32x2_t a, uint32x2_t b); // VEOR d0,d0,d0 +#define veor_u32 veor_s32 + +uint64x1_t veor_u64(uint64x1_t a, uint64x1_t b); // VEOR d0,d0,d0 +#define veor_u64 veor_s64 + +int8x16_t veorq_s8(int8x16_t a, int8x16_t b); // VEOR q0,q0,q0 +#define veorq_s8 _mm_xor_si128 + +int16x8_t veorq_s16(int16x8_t a, int16x8_t b); // VEOR q0,q0,q0 +#define veorq_s16 _mm_xor_si128 + +int32x4_t veorq_s32(int32x4_t a, int32x4_t b); // VEOR q0,q0,q0 +#define veorq_s32 _mm_xor_si128 + +int64x2_t veorq_s64(int64x2_t a, int64x2_t b); // VEOR q0,q0,q0 +#define veorq_s64 _mm_xor_si128 + +uint8x16_t veorq_u8(uint8x16_t a, uint8x16_t b); // VEOR q0,q0,q0 +#define veorq_u8 _mm_xor_si128 + +uint16x8_t veorq_u16(uint16x8_t a, uint16x8_t b); // VEOR q0,q0,q0 +#define veorq_u16 _mm_xor_si128 + +uint32x4_t veorq_u32(uint32x4_t a, uint32x4_t b); // VEOR q0,q0,q0 +#define veorq_u32 _mm_xor_si128 + +uint64x2_t veorq_u64(uint64x2_t a, uint64x2_t b); // VEOR q0,q0,q0 +#define veorq_u64 _mm_xor_si128 + +//********************** Bit Clear ********************************** +//******************************************************************* +//Logical AND complement (AND negation or AND NOT) +int8x8_t vbic_s8(int8x8_t a, int8x8_t b); // VBIC d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vbic_s8(int8x8_t a, int8x8_t b) +{ + int8x8_t res64; + return64(_mm_andnot_si128(_pM128i(b),_pM128i(a))); //notice the arguments "swap" +} + +int16x4_t vbic_s16(int16x4_t a, int16x4_t b); // VBIC d0,d0,d0 +#define vbic_s16 vbic_s8 + +int32x2_t vbic_s32(int32x2_t a, int32x2_t b); // VBIC d0,d0,d0 +#define vbic_s32 vbic_s8 + +int64x1_t vbic_s64(int64x1_t a, int64x1_t b); // VBIC d0,d0,d0 +_NEON2SSE_INLINE int64x1_t vbic_s64(int64x1_t a, int64x1_t b) +{ + int64x1_t res; + res.m64_i64[0] = a.m64_i64[0] & (~b.m64_i64[0]); + return res; +} + +uint8x8_t vbic_u8(uint8x8_t a, uint8x8_t b); // VBIC d0,d0,d0 +#define vbic_u8 vbic_s8 + +uint16x4_t vbic_u16(uint16x4_t a, uint16x4_t b); // VBIC d0,d0,d0 +#define vbic_u16 vbic_s16 + +uint32x2_t vbic_u32(uint32x2_t a, uint32x2_t b); // VBIC d0,d0,d0 +#define vbic_u32 vbic_s32 + +uint64x1_t vbic_u64(uint64x1_t a, uint64x1_t b); // VBIC d0,d0,d0 +#define vbic_u64 vbic_s64 + +int8x16_t vbicq_s8(int8x16_t a, int8x16_t b); // VBIC q0,q0,q0 +#define vbicq_s8(a,b) _mm_andnot_si128 (b,a) //notice arguments "swap" + +int16x8_t vbicq_s16(int16x8_t a, int16x8_t b); // VBIC q0,q0,q0 +#define vbicq_s16(a,b) _mm_andnot_si128 (b,a) //notice arguments "swap" + +int32x4_t vbicq_s32(int32x4_t a, int32x4_t b); // VBIC q0,q0,q0 +#define vbicq_s32(a,b) _mm_andnot_si128 (b,a) //notice arguments "swap" + +int64x2_t vbicq_s64(int64x2_t a, int64x2_t b); // VBIC q0,q0,q0 +#define vbicq_s64(a,b) _mm_andnot_si128 (b,a) //notice arguments "swap" + +uint8x16_t vbicq_u8(uint8x16_t a, uint8x16_t b); // VBIC q0,q0,q0 +#define vbicq_u8(a,b) _mm_andnot_si128 (b,a) //notice arguments "swap" + +uint16x8_t vbicq_u16(uint16x8_t a, uint16x8_t b); // VBIC q0,q0,q0 +#define vbicq_u16(a,b) _mm_andnot_si128 (b,a) //notice arguments "swap" + +uint32x4_t vbicq_u32(uint32x4_t a, uint32x4_t b); // VBIC q0,q0,q0 +#define vbicq_u32(a,b) _mm_andnot_si128 (b,a) //notice arguments "swap" + +uint64x2_t vbicq_u64(uint64x2_t a, uint64x2_t b); // VBIC q0,q0,q0 +#define vbicq_u64(a,b) _mm_andnot_si128 (b,a) //notice arguments "swap" + +//**************** Bitwise OR complement ******************************** +//**************************************** ******************************** +//no exact IA 32 match, need to implement it as following +int8x8_t vorn_s8(int8x8_t a, int8x8_t b); // VORN d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vorn_s8(int8x8_t a, int8x8_t b) +{ + int8x8_t res64; + return64(vornq_s8(_pM128i(a), _pM128i(b))); +} + + +int16x4_t vorn_s16(int16x4_t a, int16x4_t b); // VORN d0,d0,d0 +_NEON2SSE_INLINE int16x4_t vorn_s16(int16x4_t a, int16x4_t b) +{ + int16x4_t res64; + return64(vornq_s16(_pM128i(a), _pM128i(b))); +} + + +int32x2_t vorn_s32(int32x2_t a, int32x2_t b); // VORN d0,d0,d0 +_NEON2SSE_INLINE int32x2_t vorn_s32(int32x2_t a, int32x2_t b) +{ + int32x2_t res64; + return64(vornq_s32(_pM128i(a), _pM128i(b))); +} + + +int64x1_t vorn_s64(int64x1_t a, int64x1_t b); // VORN d0,d0,d0 +_NEON2SSE_INLINE int64x1_t vorn_s64(int64x1_t a, int64x1_t b) +{ + int64x1_t res; + res.m64_i64[0] = a.m64_i64[0] | (~b.m64_i64[0]); + return res; +} + +uint8x8_t vorn_u8(uint8x8_t a, uint8x8_t b); // VORN d0,d0,d0 +#define vorn_u8 vorn_s8 + + +uint16x4_t vorn_u16(uint16x4_t a, uint16x4_t b); // VORN d0,d0,d0 +#define vorn_u16 vorn_s16 + +uint32x2_t vorn_u32(uint32x2_t a, uint32x2_t b); // VORN d0,d0,d0 +#define vorn_u32 vorn_s32 + +uint64x1_t vorn_u64(uint64x1_t a, uint64x1_t b); // VORN d0,d0,d0 +#define vorn_u64 vorn_s64 + + +int8x16_t vornq_s8(int8x16_t a, int8x16_t b); // VORN q0,q0,q0 +_NEON2SSE_INLINE int8x16_t vornq_s8(int8x16_t a, int8x16_t b) // VORN q0,q0,q0 +{ + __m128i b1; + b1 = vmvnq_s8( b); //bitwise not for b + return _mm_or_si128 (a, b1); +} + +int16x8_t vornq_s16(int16x8_t a, int16x8_t b); // VORN q0,q0,q0 +_NEON2SSE_INLINE int16x8_t vornq_s16(int16x8_t a, int16x8_t b) // VORN q0,q0,q0 +{ + __m128i b1; + b1 = vmvnq_s16( b); //bitwise not for b + return _mm_or_si128 (a, b1); +} + +int32x4_t vornq_s32(int32x4_t a, int32x4_t b); // VORN q0,q0,q0 +_NEON2SSE_INLINE int32x4_t vornq_s32(int32x4_t a, int32x4_t b) // VORN q0,q0,q0 +{ + __m128i b1; + b1 = vmvnq_s32( b); //bitwise not for b + return _mm_or_si128 (a, b1); +} + +int64x2_t vornq_s64(int64x2_t a, int64x2_t b); // VORN q0,q0,q0 +_NEON2SSE_INLINE int64x2_t vornq_s64(int64x2_t a, int64x2_t b) +{ + __m128i c1, b1; + c1 = _mm_cmpeq_epi8 (a, a); //all ones 0xfffffff...fffff + b1 = _mm_andnot_si128 (b, c1); + return _mm_or_si128 (a, b1); +} + +uint8x16_t vornq_u8(uint8x16_t a, uint8x16_t b); // VORN q0,q0,q0 +_NEON2SSE_INLINE uint8x16_t vornq_u8(uint8x16_t a, uint8x16_t b) // VORN q0,q0,q0 +{ + __m128i b1; + b1 = vmvnq_u8( b); //bitwise not for b + return _mm_or_si128 (a, b1); +} + +uint16x8_t vornq_u16(uint16x8_t a, uint16x8_t b); // VORN q0,q0,q0 +_NEON2SSE_INLINE uint16x8_t vornq_u16(uint16x8_t a, uint16x8_t b) // VORN q0,q0,q0 +{ + __m128i b1; + b1 = vmvnq_s16( b); //bitwise not for b + return _mm_or_si128 (a, b1); +} + +uint32x4_t vornq_u32(uint32x4_t a, uint32x4_t b); // VORN q0,q0,q0 +_NEON2SSE_INLINE uint32x4_t vornq_u32(uint32x4_t a, uint32x4_t b) // VORN q0,q0,q0 +{ + __m128i b1; + b1 = vmvnq_u32( b); //bitwise not for b + return _mm_or_si128 (a, b1); +} +uint64x2_t vornq_u64(uint64x2_t a, uint64x2_t b); // VORN q0,q0,q0 +#define vornq_u64 vornq_s64 + +//********************* Bitwise Select ***************************** +//****************************************************************** +//Note This intrinsic can compile to any of VBSL/VBIF/VBIT depending on register allocation.(?????????) + +//VBSL (Bitwise Select) selects each bit for the destination from the first operand if the +//corresponding bit of the destination is 1, or from the second operand if the corresponding bit of the destination is 0. + +//VBIF (Bitwise Insert if False) inserts each bit from the first operand into the destination +//if the corresponding bit of the second operand is 0, otherwise leaves the destination bit unchanged + +//VBIT (Bitwise Insert if True) inserts each bit from the first operand into the destination +//if the corresponding bit of the second operand is 1, otherwise leaves the destination bit unchanged. + +//VBSL only is implemented for SIMD +int8x8_t vbsl_s8(uint8x8_t a, int8x8_t b, int8x8_t c); // VBSL d0,d0,d0 +_NEON2SSE_INLINE int8x8_t vbsl_s8(uint8x8_t a, int8x8_t b, int8x8_t c) +{ + int8x8_t res64; + __m128i res; + res = vbslq_s8(_pM128i(a), _pM128i(b), _pM128i(c)); + return64(res); +} + +int16x4_t vbsl_s16(uint16x4_t a, int16x4_t b, int16x4_t c); // VBSL d0,d0,d0 +#define vbsl_s16 vbsl_s8 + +int32x2_t vbsl_s32(uint32x2_t a, int32x2_t b, int32x2_t c); // VBSL d0,d0,d0 +#define vbsl_s32 vbsl_s8 + +int64x1_t vbsl_s64(uint64x1_t a, int64x1_t b, int64x1_t c); // VBSL d0,d0,d0 +_NEON2SSE_INLINE int64x1_t vbsl_s64(uint64x1_t a, int64x1_t b, int64x1_t c) +{ + int64x1_t res; + res.m64_i64[0] = (a.m64_i64[0] & b.m64_i64[0]) | ( (~a.m64_i64[0]) & c.m64_i64[0]); + return res; +} + +uint8x8_t vbsl_u8(uint8x8_t a, uint8x8_t b, uint8x8_t c); // VBSL d0,d0,d0 +#define vbsl_u8 vbsl_s8 + +uint16x4_t vbsl_u16(uint16x4_t a, uint16x4_t b, uint16x4_t c); // VBSL d0,d0,d0 +#define vbsl_u16 vbsl_s8 + +uint32x2_t vbsl_u32(uint32x2_t a, uint32x2_t b, uint32x2_t c); // VBSL d0,d0,d0 +#define vbsl_u32 vbsl_s8 + +uint64x1_t vbsl_u64(uint64x1_t a, uint64x1_t b, uint64x1_t c); // VBSL d0,d0,d0 +#define vbsl_u64 vbsl_s64 + +float32x2_t vbsl_f32(uint32x2_t a, float32x2_t b, float32x2_t c); // VBSL d0,d0,d0 +_NEON2SSE_INLINE float32x2_t vbsl_f32(uint32x2_t a, float32x2_t b, float32x2_t c) +{ + __m128 sel1, sel2; + __m64_128 res64; + sel1 = _mm_and_ps (_pM128(a), _pM128(b)); + sel2 = _mm_andnot_ps (_pM128(a), _pM128(c)); + sel1 = _mm_or_ps (sel1, sel2); + _M64f(res64, sel1); + return res64; +} + +poly8x8_t vbsl_p8(uint8x8_t a, poly8x8_t b, poly8x8_t c); // VBSL d0,d0,d0 +#define vbsl_p8 vbsl_s8 + +poly16x4_t vbsl_p16(uint16x4_t a, poly16x4_t b, poly16x4_t c); // VBSL d0,d0,d0 +#define vbsl_p16 vbsl_s8 + +int8x16_t vbslq_s8(uint8x16_t a, int8x16_t b, int8x16_t c); // VBSL q0,q0,q0 +_NEON2SSE_INLINE int8x16_t vbslq_s8(uint8x16_t a, int8x16_t b, int8x16_t c) // VBSL q0,q0,q0 +{ + __m128i sel1, sel2; + sel1 = _mm_and_si128 (a, b); + sel2 = _mm_andnot_si128 (a, c); + return _mm_or_si128 (sel1, sel2); +} + +int16x8_t vbslq_s16(uint16x8_t a, int16x8_t b, int16x8_t c); // VBSL q0,q0,q0 +#define vbslq_s16 vbslq_s8 + +int32x4_t vbslq_s32(uint32x4_t a, int32x4_t b, int32x4_t c); // VBSL q0,q0,q0 +#define vbslq_s32 vbslq_s8 + +int64x2_t vbslq_s64(uint64x2_t a, int64x2_t b, int64x2_t c); // VBSL q0,q0,q0 +#define vbslq_s64 vbslq_s8 + +uint8x16_t vbslq_u8(uint8x16_t a, uint8x16_t b, uint8x16_t c); // VBSL q0,q0,q0 +#define vbslq_u8 vbslq_s8 + +uint16x8_t vbslq_u16(uint16x8_t a, uint16x8_t b, uint16x8_t c); // VBSL q0,q0,q0 +#define vbslq_u16 vbslq_s8 + +uint32x4_t vbslq_u32(uint32x4_t a, uint32x4_t b, uint32x4_t c); // VBSL q0,q0,q0 +#define vbslq_u32 vbslq_s8 + +uint64x2_t vbslq_u64(uint64x2_t a, uint64x2_t b, uint64x2_t c); // VBSL q0,q0,q0 +#define vbslq_u64 vbslq_s8 + +float32x4_t vbslq_f32(uint32x4_t a, float32x4_t b, float32x4_t c); // VBSL q0,q0,q0 +_NEON2SSE_INLINE float32x4_t vbslq_f32(uint32x4_t a, float32x4_t b, float32x4_t c) // VBSL q0,q0,q0 +{ + __m128 sel1, sel2; + sel1 = _mm_and_ps (*(__m128*)&a, b); + sel2 = _mm_andnot_ps (*(__m128*)&a, c); + return _mm_or_ps (sel1, sel2); +} + +poly8x16_t vbslq_p8(uint8x16_t a, poly8x16_t b, poly8x16_t c); // VBSL q0,q0,q0 +#define vbslq_p8 vbslq_u8 + +poly16x8_t vbslq_p16(uint16x8_t a, poly16x8_t b, poly16x8_t c); // VBSL q0,q0,q0 +#define vbslq_p16 vbslq_s8 + +//************************************************************************************ +//**************** Transposition operations **************************************** +//************************************************************************************ +//***************** Vector Transpose ************************************************ +//************************************************************************************ +//VTRN (Vector Transpose) treats the elements of its operand vectors as elements of 2 x 2 matrices, and transposes the matrices. +// making the result look as (a0, b0, a2, b2, a4, b4,....) (a1, b1, a3, b3, a5, b5,.....) +int8x8x2_t vtrn_s8(int8x8_t a, int8x8_t b); // VTRN.8 d0,d0 +_NEON2SSE_INLINE int8x8x2_t vtrn_s8(int8x8_t a, int8x8_t b) // VTRN.8 d0,d0 +{ + int8x8x2_t val; + __m128i tmp, val0; + _NEON2SSE_ALIGN_16 int8_t mask16_even_odd[16] = { 0,1, 4,5, 8,9, 12,13, 2,3, 6,7, 10,11, 14,15}; //mask8_trnsp + tmp = _mm_unpacklo_epi8(_pM128i(a), _pM128i(b)); //a0,b0,a1,b1,a2,b2,a3,b3,...,a7,b7 + val0 = _mm_shuffle_epi8 (tmp, *(__m128i*)mask16_even_odd); //(a0, b0, a2, b2, a4, b4, a6, b6), (a1,b1, a3,b3, a5,b5, a7,b7) + vst1q_s8 (val.val, val0); // _mm_shuffle_epi32 (val.val[0], _SWAP_HI_LOW32); //(a1,b1, a3,b3, a5,b5, a7,b7),(a0, b0, a2, b2, a4, b4, a6, b6), + return val; +} + +int16x4x2_t vtrn_s16(int16x4_t a, int16x4_t b); // VTRN.16 d0,d0 +_NEON2SSE_INLINE int16x4x2_t vtrn_s16(int16x4_t a, int16x4_t b) // VTRN.16 d0,d0 +{ + int16x4x2_t val; + __m128i tmp, val0; + _NEON2SSE_ALIGN_16 int8_t maskdlv16[16] = {0,1, 2,3, 8,9, 10,11, 4,5, 6,7, 12,13, 14, 15}; + tmp = _mm_unpacklo_epi16(_pM128i(a), _pM128i(b)); //a0,b0,a1,b1,a2,b2,a3,b3 + val0 = _mm_shuffle_epi8 (tmp, *(__m128i*)maskdlv16); //a0, b0, a2, b2, a1,b1, a3, b3 + vst1q_s16(val.val, val0); // _mm_shuffle_epi32 (val.val[0], _SWAP_HI_LOW32); //(a1,b1, a3,b3),(a0, b0, a2, b2), + return val; +} + +int32x2x2_t vtrn_s32(int32x2_t a, int32x2_t b); // VTRN.32 d0,d0 +_NEON2SSE_INLINE int32x2x2_t vtrn_s32(int32x2_t a, int32x2_t b) +{ + int32x2x2_t val; + __m128i val0; + val0 = _mm_unpacklo_epi32(_pM128i(a), _pM128i(b)); //a0,b0,a1,b1 + vst1q_s32(val.val, val0); // _mm_shuffle_epi32(val.val[0], _SWAP_HI_LOW32); //a1,b1, a0,b0, + return val; +} + +uint8x8x2_t vtrn_u8(uint8x8_t a, uint8x8_t b); // VTRN.8 d0,d0 +#define vtrn_u8 vtrn_s8 + +uint16x4x2_t vtrn_u16(uint16x4_t a, uint16x4_t b); // VTRN.16 d0,d0 +#define vtrn_u16 vtrn_s16 + +uint32x2x2_t vtrn_u32(uint32x2_t a, uint32x2_t b); // VTRN.32 d0,d0 +#define vtrn_u32 vtrn_s32 + +float32x2x2_t vtrn_f32(float32x2_t a, float32x2_t b); // VTRN.32 d0,d0 +_NEON2SSE_INLINE float32x2x2_t vtrn_f32(float32x2_t a, float32x2_t b) +{ + float32x2x2_t val; + val.val[0].m64_f32[0] = a.m64_f32[0]; + val.val[0].m64_f32[1] = b.m64_f32[0]; + val.val[1].m64_f32[0] = a.m64_f32[1]; + val.val[1].m64_f32[1] = b.m64_f32[1]; + return val; //a0,b0,a1,b1 +} + +poly8x8x2_t vtrn_p8(poly8x8_t a, poly8x8_t b); // VTRN.8 d0,d0 +#define vtrn_p8 vtrn_u8 + +poly16x4x2_t vtrn_p16(poly16x4_t a, poly16x4_t b); // VTRN.16 d0,d0 +#define vtrn_p16 vtrn_s16 + +//int8x16x2_t vtrnq_s8(int8x16_t a, int8x16_t b); // VTRN.8 q0,q0 +_NEON2SSE_INLINE int8x16x2_t vtrnq_s8(int8x16_t a, int8x16_t b) // VTRN.8 q0,q0 +{ + int8x16x2_t r8x16; + __m128i a_sh, b_sh; + _NEON2SSE_ALIGN_16 int8_t mask8_even_odd[16] = { 0, 2, 4, 6, 8, 10, 12, 14, 1, 3,5, 7, 9, 11, 13, 15}; + a_sh = _mm_shuffle_epi8 (a, *(__m128i*)mask8_even_odd); //a0, a2, a4, a6, a8, a10, a12, a14, a1, a3, a5, a7, a9, a11, a13, a15 + b_sh = _mm_shuffle_epi8 (b, *(__m128i*)mask8_even_odd); //b0, b2, b4, b6, b8, b10, b12, b14, b1, b3, b5, b7, b9, b11, b13, b15 + + r8x16.val[0] = _mm_unpacklo_epi8(a_sh, b_sh); //(a0, b0, a2, b2, a4, b4, a6, b6, a8,b8, a10,b10, a12,b12, a14,b14) + r8x16.val[1] = _mm_unpackhi_epi8(a_sh, b_sh); // (a1, b1, a3, b3, a5, b5, a7, b7, a9,b9, a11,b11, a13,b13, a15,b15) + return r8x16; +} + +int16x8x2_t vtrnq_s16(int16x8_t a, int16x8_t b); // VTRN.16 q0,q0 +_NEON2SSE_INLINE int16x8x2_t vtrnq_s16(int16x8_t a, int16x8_t b) // VTRN.16 q0,q0 +{ + int16x8x2_t v16x8; + __m128i a_sh, b_sh; + _NEON2SSE_ALIGN_16 int8_t mask16_even_odd[16] = { 0,1, 4,5, 8,9, 12,13, 2,3, 6,7, 10,11, 14,15}; + a_sh = _mm_shuffle_epi8 (a, *(__m128i*)mask16_even_odd); //a0, a2, a4, a6, a1, a3, a5, a7 + b_sh = _mm_shuffle_epi8 (b, *(__m128i*)mask16_even_odd); //b0, b2, b4, b6, b1, b3, b5, b7 + v16x8.val[0] = _mm_unpacklo_epi16(a_sh, b_sh); //a0, b0, a2, b2, a4, b4, a6, b6 + v16x8.val[1] = _mm_unpackhi_epi16(a_sh, b_sh); //a1, b1, a3, b3, a5, b5, a7, b7 + return v16x8; +} + +int32x4x2_t vtrnq_s32(int32x4_t a, int32x4_t b); // VTRN.32 q0,q0 +_NEON2SSE_INLINE int32x4x2_t vtrnq_s32(int32x4_t a, int32x4_t b) // VTRN.32 q0,q0 +{ + //may be not optimal solution compared with serial + int32x4x2_t v32x4; + __m128i a_sh, b_sh; + a_sh = _mm_shuffle_epi32 (a, 216); //a0, a2, a1, a3 + b_sh = _mm_shuffle_epi32 (b, 216); //b0, b2, b1, b3 + + v32x4.val[0] = _mm_unpacklo_epi32(a_sh, b_sh); //a0, b0, a2, b2 + v32x4.val[1] = _mm_unpackhi_epi32(a_sh, b_sh); //a1, b1, a3, b3 + return v32x4; +} + +uint8x16x2_t vtrnq_u8(uint8x16_t a, uint8x16_t b); // VTRN.8 q0,q0 +#define vtrnq_u8 vtrnq_s8 + +uint16x8x2_t vtrnq_u16(uint16x8_t a, uint16x8_t b); // VTRN.16 q0,q0 +#define vtrnq_u16 vtrnq_s16 + +uint32x4x2_t vtrnq_u32(uint32x4_t a, uint32x4_t b); // VTRN.32 q0,q0 +#define vtrnq_u32 vtrnq_s32 + +float32x4x2_t vtrnq_f32(float32x4_t a, float32x4_t b); // VTRN.32 q0,q0 +_NEON2SSE_INLINE float32x4x2_t vtrnq_f32(float32x4_t a, float32x4_t b) // VTRN.32 q0,q0 +{ + //may be not optimal solution compared with serial + float32x4x2_t f32x4; + __m128 a_sh, b_sh; + a_sh = _mm_shuffle_ps (a, a, _MM_SHUFFLE(3,1, 2, 0)); //a0, a2, a1, a3, need to check endiness + b_sh = _mm_shuffle_ps (b, b, _MM_SHUFFLE(3,1, 2, 0)); //b0, b2, b1, b3, need to check endiness + + f32x4.val[0] = _mm_unpacklo_ps(a_sh, b_sh); //a0, b0, a2, b2 + f32x4.val[1] = _mm_unpackhi_ps(a_sh, b_sh); //a1, b1, a3, b3 + return f32x4; +} + +poly8x16x2_t vtrnq_p8(poly8x16_t a, poly8x16_t b); // VTRN.8 q0,q0 +#define vtrnq_p8 vtrnq_s8 + +poly16x8x2_t vtrnq_p16(poly16x8_t a, poly16x8_t b); // VTRN.16 q0,q0 +#define vtrnq_p16 vtrnq_s16 + +//***************** Interleave elements *************************** +//***************************************************************** +//output has (a0,b0,a1,b1, a2,b2,.....) +int8x8x2_t vzip_s8(int8x8_t a, int8x8_t b); // VZIP.8 d0,d0 +_NEON2SSE_INLINE int8x8x2_t vzip_s8(int8x8_t a, int8x8_t b) // VZIP.8 d0,d0 +{ + int8x8x2_t val; + __m128i val0; + val0 = _mm_unpacklo_epi8(_pM128i(a), _pM128i(b)); + vst1q_s8(val.val, val0); //_mm_shuffle_epi32(val.val[0], _SWAP_HI_LOW32); + return val; +} + +int16x4x2_t vzip_s16(int16x4_t a, int16x4_t b); // VZIP.16 d0,d0 +_NEON2SSE_INLINE int16x4x2_t vzip_s16(int16x4_t a, int16x4_t b) // VZIP.16 d0,d0 +{ + int16x4x2_t val; + __m128i val0; + val0 = _mm_unpacklo_epi16(_pM128i(a), _pM128i(b)); + vst1q_s16(val.val, val0); // _mm_shuffle_epi32(val.val[0], _SWAP_HI_LOW32); + return val; +} + +int32x2x2_t vzip_s32(int32x2_t a, int32x2_t b); // VZIP.32 d0,d0 +#define vzip_s32 vtrn_s32 + +uint8x8x2_t vzip_u8(uint8x8_t a, uint8x8_t b); // VZIP.8 d0,d0 +#define vzip_u8 vzip_s8 + +uint16x4x2_t vzip_u16(uint16x4_t a, uint16x4_t b); // VZIP.16 d0,d0 +#define vzip_u16 vzip_s16 + +uint32x2x2_t vzip_u32(uint32x2_t a, uint32x2_t b); // VZIP.32 d0,d0 +#define vzip_u32 vzip_s32 + +float32x2x2_t vzip_f32(float32x2_t a, float32x2_t b); // VZIP.32 d0,d0 +#define vzip_f32 vtrn_f32 + +poly8x8x2_t vzip_p8(poly8x8_t a, poly8x8_t b); // VZIP.8 d0,d0 +#define vzip_p8 vzip_u8 + +poly16x4x2_t vzip_p16(poly16x4_t a, poly16x4_t b); // VZIP.16 d0,d0 +#define vzip_p16 vzip_u16 + +int8x16x2_t vzipq_s8(int8x16_t a, int8x16_t b); // VZIP.8 q0,q0 +_NEON2SSE_INLINE int8x16x2_t vzipq_s8(int8x16_t a, int8x16_t b) // VZIP.8 q0,q0 +{ + int8x16x2_t r8x16; + r8x16.val[0] = _mm_unpacklo_epi8(a, b); + r8x16.val[1] = _mm_unpackhi_epi8(a, b); + return r8x16; +} + +int16x8x2_t vzipq_s16(int16x8_t a, int16x8_t b); // VZIP.16 q0,q0 +_NEON2SSE_INLINE int16x8x2_t vzipq_s16(int16x8_t a, int16x8_t b) // VZIP.16 q0,q0 +{ + int16x8x2_t r16x8; + r16x8.val[0] = _mm_unpacklo_epi16(a, b); + r16x8.val[1] = _mm_unpackhi_epi16(a, b); + return r16x8; +} + +int32x4x2_t vzipq_s32(int32x4_t a, int32x4_t b); // VZIP.32 q0,q0 +_NEON2SSE_INLINE int32x4x2_t vzipq_s32(int32x4_t a, int32x4_t b) // VZIP.32 q0,q0 +{ + int32x4x2_t r32x4; + r32x4.val[0] = _mm_unpacklo_epi32(a, b); + r32x4.val[1] = _mm_unpackhi_epi32(a, b); + return r32x4; +} + +uint8x16x2_t vzipq_u8(uint8x16_t a, uint8x16_t b); // VZIP.8 q0,q0 +#define vzipq_u8 vzipq_s8 + +uint16x8x2_t vzipq_u16(uint16x8_t a, uint16x8_t b); // VZIP.16 q0,q0 +#define vzipq_u16 vzipq_s16 + +uint32x4x2_t vzipq_u32(uint32x4_t a, uint32x4_t b); // VZIP.32 q0,q0 +#define vzipq_u32 vzipq_s32 + +float32x4x2_t vzipq_f32(float32x4_t a, float32x4_t b); // VZIP.32 q0,q0 +_NEON2SSE_INLINE float32x4x2_t vzipq_f32(float32x4_t a, float32x4_t b) // VZIP.32 q0,q0 +{ + float32x4x2_t f32x4; + f32x4.val[0] = _mm_unpacklo_ps ( a, b); + f32x4.val[1] = _mm_unpackhi_ps ( a, b); + return f32x4; +} + +poly8x16x2_t vzipq_p8(poly8x16_t a, poly8x16_t b); // VZIP.8 q0,q0 +#define vzipq_p8 vzipq_u8 + +poly16x8x2_t vzipq_p16(poly16x8_t a, poly16x8_t b); // VZIP.16 q0,q0 +#define vzipq_p16 vzipq_u16 + +//*********************** De-Interleave elements ************************* +//************************************************************************* +//As the result of these functions first val contains (a0,a2,a4,....,b0,b2, b4,...) and the second val (a1,a3,a5,....b1,b3,b5...) +//no such functions in IA32 SIMD, shuffle is required +int8x8x2_t vuzp_s8(int8x8_t a, int8x8_t b); // VUZP.8 d0,d0 +_NEON2SSE_INLINE int8x8x2_t vuzp_s8(int8x8_t a, int8x8_t b) // VUZP.8 d0,d0 +{ + int8x8x2_t val; + __m128i tmp, val0; + _NEON2SSE_ALIGN_16 int8_t maskdlv8[16] = { 0, 4, 8, 12, 1, 5, 9, 13, 2, 6, 10, 14, 3, 7, 11,15}; + tmp = _mm_unpacklo_epi8(_pM128i(a), _pM128i(b)); //a0,b0,a1,b1,a2,b2,a3,b3,...,a7,b7 + val0 = _mm_shuffle_epi8 (tmp, *(__m128i*)maskdlv8); //(a0, a2, a4, a6, b0, b2, b4, b6), (a1, a3, a5, a7, b1,b3, b5, b7) + vst1q_s8(val.val, val0); // _mm_shuffle_epi32(val.val[0], _SWAP_HI_LOW32); + return val; +} + +int16x4x2_t vuzp_s16(int16x4_t a, int16x4_t b); // VUZP.16 d0,d0 +_NEON2SSE_INLINE int16x4x2_t vuzp_s16(int16x4_t a, int16x4_t b) // VUZP.16 d0,d0 +{ + int16x4x2_t val; + __m128i tmp, val0; + _NEON2SSE_ALIGN_16 int8_t maskdlv16[16] = {0,1, 8,9, 2,3, 10,11, 4,5, 12,13, 6,7, 14,15}; + tmp = _mm_unpacklo_epi16(_pM128i(a), _pM128i(b)); //a0,b0,a1,b1,a2,b2,a3,b3 + val0 = _mm_shuffle_epi8 (tmp, *(__m128i*)maskdlv16); //a0,a2, b0, b2, a1,a3, b1,b3 + vst1q_s16(val.val, val0); // _mm_shuffle_epi32(val.val[0], _SWAP_HI_LOW32); + return val; +} + +int32x2x2_t vuzp_s32(int32x2_t a, int32x2_t b); // VUZP.32 d0,d0 +_NEON2SSE_INLINE int32x2x2_t vuzp_s32(int32x2_t a, int32x2_t b) // VUZP.32 d0,d0 +{ + int32x2x2_t val; + __m128i val0; + val0 = _mm_unpacklo_epi32(_pM128i(a), _pM128i(b)); //a0,b0, a1,b1 + vst1q_s32(val.val, val0); // _mm_shuffle_epi32(val.val[0], _SWAP_HI_LOW32); + return val; +} + +uint8x8x2_t vuzp_u8(uint8x8_t a, uint8x8_t b); // VUZP.8 d0,d0 +#define vuzp_u8 vuzp_s8 + +uint16x4x2_t vuzp_u16(uint16x4_t a, uint16x4_t b); // VUZP.16 d0,d0 +#define vuzp_u16 vuzp_s16 + +uint32x2x2_t vuzp_u32(uint32x2_t a, uint32x2_t b); // VUZP.32 d0,d0 +#define vuzp_u32 vuzp_s32 + +float32x2x2_t vuzp_f32(float32x2_t a, float32x2_t b); // VUZP.32 d0,d0 +#define vuzp_f32 vzip_f32 + +poly8x8x2_t vuzp_p8(poly8x8_t a, poly8x8_t b); // VUZP.8 d0,d0 +#define vuzp_p8 vuzp_u8 + +poly16x4x2_t vuzp_p16(poly16x4_t a, poly16x4_t b); // VUZP.16 d0,d0 +#define vuzp_p16 vuzp_u16 + +int8x16x2_t vuzpq_s8(int8x16_t a, int8x16_t b); // VUZP.8 q0,q0 +_NEON2SSE_INLINE int8x16x2_t vuzpq_s8(int8x16_t a, int8x16_t b) // VUZP.8 q0,q0 +{ + int8x16x2_t v8x16; + __m128i a_sh, b_sh; + _NEON2SSE_ALIGN_16 int8_t mask8_even_odd[16] = { 0, 2, 4, 6, 8, 10, 12, 14, 1, 3,5, 7, 9, 11, 13, 15}; + a_sh = _mm_shuffle_epi8 (a, *(__m128i*)mask8_even_odd); //a0, a2, a4, a6, a8, a10, a12, a14, a1, a3, a5, a7, a9, a11, a13, a15 + b_sh = _mm_shuffle_epi8 (b, *(__m128i*)mask8_even_odd); //b0, b2, b4, b6, b8, b10, b12, b14, b1, b3, b5, b7, b9, b11, b13, b15 + //we need unpack64 to combine lower (upper) 64 bits from a with lower (upper) 64 bits from b + v8x16.val[0] = _mm_unpacklo_epi64(a_sh, b_sh); ///a0, a2, a4, a6, a8, a10, a12, a14, b0, b2, b4, b6, b8, b10, b12, b14, + v8x16.val[1] = _mm_unpackhi_epi64(a_sh, b_sh); //a1, a3, a5, a7, a9, a11, a13, a15, b1, b3, b5, b7, b9, b11, b13, b15 + return v8x16; +} + +int16x8x2_t vuzpq_s16(int16x8_t a, int16x8_t b); // VUZP.16 q0,q0 +_NEON2SSE_INLINE int16x8x2_t vuzpq_s16(int16x8_t a, int16x8_t b) // VUZP.16 q0,q0 +{ + int16x8x2_t v16x8; + __m128i a_sh, b_sh; + _NEON2SSE_ALIGN_16 int8_t mask16_even_odd[16] = { 0,1, 4,5, 8,9, 12,13, 2,3, 6,7, 10,11, 14,15}; + a_sh = _mm_shuffle_epi8 (a, *(__m128i*)mask16_even_odd); //a0, a2, a4, a6, a1, a3, a5, a7 + b_sh = _mm_shuffle_epi8 (b, *(__m128i*)mask16_even_odd); //b0, b2, b4, b6, b1, b3, b5, b7 + v16x8.val[0] = _mm_unpacklo_epi64(a_sh, b_sh); //a0, a2, a4, a6, b0, b2, b4, b6 + v16x8.val[1] = _mm_unpackhi_epi64(a_sh, b_sh); //a1, a3, a5, a7, b1, b3, b5, b7 + return v16x8; +} + +int32x4x2_t vuzpq_s32(int32x4_t a, int32x4_t b); // VUZP.32 q0,q0 +_NEON2SSE_INLINE int32x4x2_t vuzpq_s32(int32x4_t a, int32x4_t b) // VUZP.32 q0,q0 +{ + //may be not optimal solution compared with serial + int32x4x2_t v32x4; + __m128i a_sh, b_sh; + a_sh = _mm_shuffle_epi32 (a, 216); //a0, a2, a1, a3 + b_sh = _mm_shuffle_epi32 (b, 216); //b0, b2, b1, b3 + + v32x4.val[0] = _mm_unpacklo_epi64(a_sh, b_sh); //a0, a2, b0, b2 + v32x4.val[1] = _mm_unpackhi_epi64(a_sh, b_sh); //a1, a3, b1, b3 + return v32x4; +} + +uint8x16x2_t vuzpq_u8(uint8x16_t a, uint8x16_t b); // VUZP.8 q0,q0 +#define vuzpq_u8 vuzpq_s8 + +uint16x8x2_t vuzpq_u16(uint16x8_t a, uint16x8_t b); // VUZP.16 q0,q0 +#define vuzpq_u16 vuzpq_s16 + +uint32x4x2_t vuzpq_u32(uint32x4_t a, uint32x4_t b); // VUZP.32 q0,q0 +#define vuzpq_u32 vuzpq_s32 + +float32x4x2_t vuzpq_f32(float32x4_t a, float32x4_t b); // VUZP.32 q0,q0 +_NEON2SSE_INLINE float32x4x2_t vuzpq_f32(float32x4_t a, float32x4_t b) // VUZP.32 q0,q0 +{ + float32x4x2_t v32x4; + v32x4.val[0] = _mm_shuffle_ps(a, b, _MM_SHUFFLE(2,0, 2, 0)); //a0, a2, b0, b2 , need to check endianess however + v32x4.val[1] = _mm_shuffle_ps(a, b, _MM_SHUFFLE(3,1, 3, 1)); //a1, a3, b1, b3, need to check endianess however + return v32x4; +} + +poly8x16x2_t vuzpq_p8(poly8x16_t a, poly8x16_t b); // VUZP.8 q0,q0 +#define vuzpq_p8 vuzpq_u8 + +poly16x8x2_t vuzpq_p16(poly16x8_t a, poly16x8_t b); // VUZP.16 q0,q0 +#define vuzpq_p16 vuzpq_u16 + +//############################################################################################## +//*********************** Reinterpret cast intrinsics.****************************************** +//############################################################################################## +// Not a part of oficial NEON instruction set but available in gcc compiler ********************* +poly8x8_t vreinterpret_p8_u32 (uint32x2_t t); +#define vreinterpret_p8_u32 + +poly8x8_t vreinterpret_p8_u16 (uint16x4_t t); +#define vreinterpret_p8_u16 + +poly8x8_t vreinterpret_p8_u8 (uint8x8_t t); +#define vreinterpret_p8_u8 + +poly8x8_t vreinterpret_p8_s32 (int32x2_t t); +#define vreinterpret_p8_s32 + +poly8x8_t vreinterpret_p8_s16 (int16x4_t t); +#define vreinterpret_p8_s16 + +poly8x8_t vreinterpret_p8_s8 (int8x8_t t); +#define vreinterpret_p8_s8 + +poly8x8_t vreinterpret_p8_u64 (uint64x1_t t); +#define vreinterpret_p8_u64 + +poly8x8_t vreinterpret_p8_s64 (int64x1_t t); +#define vreinterpret_p8_s64 + +poly8x8_t vreinterpret_p8_f32 (float32x2_t t); +#define vreinterpret_p8_f32 + +poly8x8_t vreinterpret_p8_p16 (poly16x4_t t); +#define vreinterpret_p8_p16 + +poly8x16_t vreinterpretq_p8_u32 (uint32x4_t t); +#define vreinterpretq_p8_u32 + +poly8x16_t vreinterpretq_p8_u16 (uint16x8_t t); +#define vreinterpretq_p8_u16 + +poly8x16_t vreinterpretq_p8_u8 (uint8x16_t t); +#define vreinterpretq_p8_u8 + +poly8x16_t vreinterpretq_p8_s32 (int32x4_t t); +#define vreinterpretq_p8_s32 + +poly8x16_t vreinterpretq_p8_s16 (int16x8_t t); +#define vreinterpretq_p8_s16 + +poly8x16_t vreinterpretq_p8_s8 (int8x16_t t); +#define vreinterpretq_p8_s8 + +poly8x16_t vreinterpretq_p8_u64 (uint64x2_t t); +#define vreinterpretq_p8_u64 + +poly8x16_t vreinterpretq_p8_s64 (int64x2_t t); +#define vreinterpretq_p8_s64 + +poly8x16_t vreinterpretq_p8_f32 (float32x4_t t); +#define vreinterpretq_p8_f32(t) _M128i(t) + +poly8x16_t vreinterpretq_p8_p16 (poly16x8_t t); +#define vreinterpretq_p8_p16 + +poly16x4_t vreinterpret_p16_u32 (uint32x2_t t); +#define vreinterpret_p16_u32 + +poly16x4_t vreinterpret_p16_u16 (uint16x4_t t); +#define vreinterpret_p16_u16 + +poly16x4_t vreinterpret_p16_u8 (uint8x8_t t); +#define vreinterpret_p16_u8 + +poly16x4_t vreinterpret_p16_s32 (int32x2_t t); +#define vreinterpret_p16_s32 + +poly16x4_t vreinterpret_p16_s16 (int16x4_t t); +#define vreinterpret_p16_s16 + +poly16x4_t vreinterpret_p16_s8 (int8x8_t t); +#define vreinterpret_p16_s8 + +poly16x4_t vreinterpret_p16_u64 (uint64x1_t t); +#define vreinterpret_p16_u64 + +poly16x4_t vreinterpret_p16_s64 (int64x1_t t); +#define vreinterpret_p16_s64 + +poly16x4_t vreinterpret_p16_f32 (float32x2_t t); +#define vreinterpret_p16_f32 + +poly16x4_t vreinterpret_p16_p8 (poly8x8_t t); +#define vreinterpret_p16_p8 + +poly16x8_t vreinterpretq_p16_u32 (uint32x4_t t); +#define vreinterpretq_p16_u32 + +poly16x8_t vreinterpretq_p16_u16 (uint16x8_t t); +#define vreinterpretq_p16_u16 + +poly16x8_t vreinterpretq_p16_s32 (int32x4_t t); +#define vreinterpretq_p16_s32 + +poly16x8_t vreinterpretq_p16_s16 (int16x8_t t); +#define vreinterpretq_p16_s16 + +poly16x8_t vreinterpretq_p16_s8 (int8x16_t t); +#define vreinterpretq_p16_s8 + +poly16x8_t vreinterpretq_p16_u64 (uint64x2_t t); +#define vreinterpretq_p16_u64 + +poly16x8_t vreinterpretq_p16_s64 (int64x2_t t); +#define vreinterpretq_p16_s64 + +poly16x8_t vreinterpretq_p16_f32 (float32x4_t t); +#define vreinterpretq_p16_f32(t) _M128i(t) + +poly16x8_t vreinterpretq_p16_p8 (poly8x16_t t); +#define vreinterpretq_p16_p8 vreinterpretq_s16_p8 + +//**** Integer to float ****** +float32x2_t vreinterpret_f32_u32 (uint32x2_t t); +#define vreinterpret_f32_u32(t) (*(__m64_128*)&(t)) + + +float32x2_t vreinterpret_f32_u16 (uint16x4_t t); +#define vreinterpret_f32_u16 vreinterpret_f32_u32 + + +float32x2_t vreinterpret_f32_u8 (uint8x8_t t); +#define vreinterpret_f32_u8 vreinterpret_f32_u32 + + +float32x2_t vreinterpret_f32_s32 (int32x2_t t); +#define vreinterpret_f32_s32 vreinterpret_f32_u32 + + +float32x2_t vreinterpret_f32_s16 (int16x4_t t); +#define vreinterpret_f32_s16 vreinterpret_f32_u32 + +float32x2_t vreinterpret_f32_s8 (int8x8_t t); +#define vreinterpret_f32_s8 vreinterpret_f32_u32 + + +float32x2_t vreinterpret_f32_u64(uint64x1_t t); +#define vreinterpret_f32_u64 vreinterpret_f32_u32 + + +float32x2_t vreinterpret_f32_s64 (int64x1_t t); +#define vreinterpret_f32_s64 vreinterpret_f32_u32 + + +float32x2_t vreinterpret_f32_p16 (poly16x4_t t); +#define vreinterpret_f32_p16 vreinterpret_f32_u32 + +float32x2_t vreinterpret_f32_p8 (poly8x8_t t); +#define vreinterpret_f32_p8 vreinterpret_f32_u32 + +float32x4_t vreinterpretq_f32_u32 (uint32x4_t t); +#define vreinterpretq_f32_u32(t) *(__m128*)&(t) + +float32x4_t vreinterpretq_f32_u16 (uint16x8_t t); +#define vreinterpretq_f32_u16 vreinterpretq_f32_u32 + +float32x4_t vreinterpretq_f32_u8 (uint8x16_t t); +#define vreinterpretq_f32_u8 vreinterpretq_f32_u32 + +float32x4_t vreinterpretq_f32_s32 (int32x4_t t); +#define vreinterpretq_f32_s32 vreinterpretq_f32_u32 + +float32x4_t vreinterpretq_f32_s16 (int16x8_t t); +#define vreinterpretq_f32_s16 vreinterpretq_f32_u32 + +float32x4_t vreinterpretq_f32_s8 (int8x16_t t); +#define vreinterpretq_f32_s8 vreinterpretq_f32_u32 + +float32x4_t vreinterpretq_f32_u64 (uint64x2_t t); +#define vreinterpretq_f32_u64 vreinterpretq_f32_u32 + +float32x4_t vreinterpretq_f32_s64 (int64x2_t t); +#define vreinterpretq_f32_s64 vreinterpretq_f32_u32 + +float32x4_t vreinterpretq_f32_p16 (poly16x8_t t); +#define vreinterpretq_f32_p16 vreinterpretq_f32_u32 + +float32x4_t vreinterpretq_f32_p8 (poly8x16_t t); +#define vreinterpretq_f32_p8 vreinterpretq_f32_u32 + +//*** Integer type conversions ****************** +//no conversion necessary for the following functions because it is same data type +int64x1_t vreinterpret_s64_u32 (uint32x2_t t); +#define vreinterpret_s64_u32 + +int64x1_t vreinterpret_s64_u16 (uint16x4_t t); +#define vreinterpret_s64_u16 + +int64x1_t vreinterpret_s64_u8 (uint8x8_t t); +#define vreinterpret_s64_u8 + +int64x1_t vreinterpret_s64_s32 (int32x2_t t); +#define vreinterpret_s64_s32 + +int64x1_t vreinterpret_s64_s16 (int16x4_t t); +#define vreinterpret_s64_s16 + +int64x1_t vreinterpret_s64_s8 (int8x8_t t); +#define vreinterpret_s64_s8 + +int64x1_t vreinterpret_s64_u64 (uint64x1_t t); +#define vreinterpret_s64_u64 + +int64x1_t vreinterpret_s64_f32 (float32x2_t t); +#define vreinterpret_s64_f32 + +int64x1_t vreinterpret_s64_p16 (poly16x4_t t); +#define vreinterpret_s64_p16 + +int64x1_t vreinterpret_s64_p8 (poly8x8_t t); +#define vreinterpret_s64_p8 + +int64x2_t vreinterpretq_s64_u32 (uint32x4_t t); +#define vreinterpretq_s64_u32 + +int64x2_t vreinterpretq_s64_s16 (uint16x8_t t); +#define vreinterpretq_s64_s16 + +int64x2_t vreinterpretq_s64_u8 (uint8x16_t t); +#define vreinterpretq_s64_u8 + +int64x2_t vreinterpretq_s64_s32 (int32x4_t t); +#define vreinterpretq_s64_s32 + +int64x2_t vreinterpretq_s64_u16 (int16x8_t t); +#define vreinterpretq_s64_u16 + +int64x2_t vreinterpretq_s64_s8 (int8x16_t t); +#define vreinterpretq_s64_s8 + +int64x2_t vreinterpretq_s64_u64 (uint64x2_t t); +#define vreinterpretq_s64_u64 + +int64x2_t vreinterpretq_s64_f32 (float32x4_t t); +#define vreinterpretq_s64_f32(t) _M128i(t) + +int64x2_t vreinterpretq_s64_p16 (poly16x8_t t); +#define vreinterpretq_s64_p16 + +int64x2_t vreinterpretq_s64_p8 (poly8x16_t t); +#define vreinterpretq_s64_p8 + +uint64x1_t vreinterpret_u64_u32 (uint32x2_t t); +#define vreinterpret_u64_u32 + +uint64x1_t vreinterpret_u64_u16 (uint16x4_t t); +#define vreinterpret_u64_u16 + +uint64x1_t vreinterpret_u64_u8 (uint8x8_t t); +#define vreinterpret_u64_u8 + +uint64x1_t vreinterpret_u64_s32 (int32x2_t t); +#define vreinterpret_u64_s32 + +uint64x1_t vreinterpret_u64_s16 (int16x4_t t); +#define vreinterpret_u64_s16 + +uint64x1_t vreinterpret_u64_s8 (int8x8_t t); +#define vreinterpret_u64_s8 + +uint64x1_t vreinterpret_u64_s64 (int64x1_t t); +#define vreinterpret_u64_s64 + +uint64x1_t vreinterpret_u64_f32 (float32x2_t t); +#define vreinterpret_u64_f32 + +uint64x1_t vreinterpret_u64_p16 (poly16x4_t t); +#define vreinterpret_u64_p16 + +uint64x1_t vreinterpret_u64_p8 (poly8x8_t t); +#define vreinterpret_u64_p8 + +uint64x2_t vreinterpretq_u64_u32 (uint32x4_t t); +#define vreinterpretq_u64_u32 + +uint64x2_t vreinterpretq_u64_u16 (uint16x8_t t); +#define vreinterpretq_u64_u16 + +uint64x2_t vreinterpretq_u64_u8 (uint8x16_t t); +#define vreinterpretq_u64_u8 + +uint64x2_t vreinterpretq_u64_s32 (int32x4_t t); +#define vreinterpretq_u64_s32 + +uint64x2_t vreinterpretq_u64_s16 (int16x8_t t); +#define vreinterpretq_u64_s16 + +uint64x2_t vreinterpretq_u64_s8 (int8x16_t t); +#define vreinterpretq_u64_s8 + +uint64x2_t vreinterpretq_u64_s64 (int64x2_t t); +#define vreinterpretq_u64_s64 + +uint64x2_t vreinterpretq_u64_f32 (float32x4_t t); +#define vreinterpretq_u64_f32(t) _M128i(t) + +uint64x2_t vreinterpretq_u64_p16 (poly16x8_t t); +#define vreinterpretq_u64_p16 + +uint64x2_t vreinterpretq_u64_p8 (poly8x16_t t); +#define vreinterpretq_u64_p8 + +int8x8_t vreinterpret_s8_u32 (uint32x2_t t); +#define vreinterpret_s8_u32 + +int8x8_t vreinterpret_s8_u16 (uint16x4_t t); +#define vreinterpret_s8_u16 + +int8x8_t vreinterpret_s8_u8 (uint8x8_t t); +#define vreinterpret_s8_u8 + +int8x8_t vreinterpret_s8_s32 (int32x2_t t); +#define vreinterpret_s8_s32 + +int8x8_t vreinterpret_s8_s16 (int16x4_t t); +#define vreinterpret_s8_s16 + +int8x8_t vreinterpret_s8_u64 (uint64x1_t t); +#define vreinterpret_s8_u64 + +int8x8_t vreinterpret_s8_s64 (int64x1_t t); +#define vreinterpret_s8_s64 + +int8x8_t vreinterpret_s8_f32 (float32x2_t t); +#define vreinterpret_s8_f32 + +int8x8_t vreinterpret_s8_p16 (poly16x4_t t); +#define vreinterpret_s8_p16 + +int8x8_t vreinterpret_s8_p8 (poly8x8_t t); +#define vreinterpret_s8_p8 + +int8x16_t vreinterpretq_s8_u32 (uint32x4_t t); +#define vreinterpretq_s8_u32 + +int8x16_t vreinterpretq_s8_u16 (uint16x8_t t); +#define vreinterpretq_s8_u16 + +int8x16_t vreinterpretq_s8_u8 (uint8x16_t t); +#define vreinterpretq_s8_u8 + +int8x16_t vreinterpretq_s8_s32 (int32x4_t t); +#define vreinterpretq_s8_s32 + +int8x16_t vreinterpretq_s8_s16 (int16x8_t t); +#define vreinterpretq_s8_s16 + +int8x16_t vreinterpretq_s8_u64 (uint64x2_t t); +#define vreinterpretq_s8_u64 + +int8x16_t vreinterpretq_s8_s64 (int64x2_t t); +#define vreinterpretq_s8_s64 + +int8x16_t vreinterpretq_s8_f32 (float32x4_t t); +#define vreinterpretq_s8_f32(t) _M128i(t) + +int8x16_t vreinterpretq_s8_p16 (poly16x8_t t); +#define vreinterpretq_s8_p16 + +int8x16_t vreinterpretq_s8_p8 (poly8x16_t t); +#define vreinterpretq_s8_p8 + +int16x4_t vreinterpret_s16_u32 (uint32x2_t t); +#define vreinterpret_s16_u32 + +int16x4_t vreinterpret_s16_u16 (uint16x4_t t); +#define vreinterpret_s16_u16 + +int16x4_t vreinterpret_s16_u8 (uint8x8_t t); +#define vreinterpret_s16_u8 + +int16x4_t vreinterpret_s16_s32 (int32x2_t t); +#define vreinterpret_s16_s32 + +int16x4_t vreinterpret_s16_s8 (int8x8_t t); +#define vreinterpret_s16_s8 + +int16x4_t vreinterpret_s16_u64 (uint64x1_t t); +#define vreinterpret_s16_u64 + +int16x4_t vreinterpret_s16_s64 (int64x1_t t); +#define vreinterpret_s16_s64 + +int16x4_t vreinterpret_s16_f32 (float32x2_t t); +#define vreinterpret_s16_f32 + + +int16x4_t vreinterpret_s16_p16 (poly16x4_t t); +#define vreinterpret_s16_p16 + +int16x4_t vreinterpret_s16_p8 (poly8x8_t t); +#define vreinterpret_s16_p8 + +int16x8_t vreinterpretq_s16_u32 (uint32x4_t t); +#define vreinterpretq_s16_u32 + +int16x8_t vreinterpretq_s16_u16 (uint16x8_t t); +#define vreinterpretq_s16_u16 + +int16x8_t vreinterpretq_s16_u8 (uint8x16_t t); +#define vreinterpretq_s16_u8 + +int16x8_t vreinterpretq_s16_s32 (int32x4_t t); +#define vreinterpretq_s16_s32 + +int16x8_t vreinterpretq_s16_s8 (int8x16_t t); +#define vreinterpretq_s16_s8 + +int16x8_t vreinterpretq_s16_u64 (uint64x2_t t); +#define vreinterpretq_s16_u64 + +int16x8_t vreinterpretq_s16_s64 (int64x2_t t); +#define vreinterpretq_s16_s64 + +int16x8_t vreinterpretq_s16_f32 (float32x4_t t); +#define vreinterpretq_s16_f32(t) _M128i(t) + +int16x8_t vreinterpretq_s16_p16 (poly16x8_t t); +#define vreinterpretq_s16_p16 + +int16x8_t vreinterpretq_s16_p8 (poly8x16_t t); +#define vreinterpretq_s16_p8 + +int32x2_t vreinterpret_s32_u32 (uint32x2_t t); +#define vreinterpret_s32_u32 + +int32x2_t vreinterpret_s32_u16 (uint16x4_t t); +#define vreinterpret_s32_u16 + +int32x2_t vreinterpret_s32_u8 (uint8x8_t t); +#define vreinterpret_s32_u8 + +int32x2_t vreinterpret_s32_s16 (int16x4_t t); +#define vreinterpret_s32_s16 + +int32x2_t vreinterpret_s32_s8 (int8x8_t t); +#define vreinterpret_s32_s8 + +int32x2_t vreinterpret_s32_u64 (uint64x1_t t); +#define vreinterpret_s32_u64 + +int32x2_t vreinterpret_s32_s64 (int64x1_t t); +#define vreinterpret_s32_s64 + +int32x2_t vreinterpret_s32_f32 (float32x2_t t); +#define vreinterpret_s32_f32 + +int32x2_t vreinterpret_s32_p16 (poly16x4_t t); +#define vreinterpret_s32_p16 + +int32x2_t vreinterpret_s32_p8 (poly8x8_t t); +#define vreinterpret_s32_p8 + +int32x4_t vreinterpretq_s32_u32 (uint32x4_t t); +#define vreinterpretq_s32_u32 + +int32x4_t vreinterpretq_s32_u16 (uint16x8_t t); +#define vreinterpretq_s32_u16 + +int32x4_t vreinterpretq_s32_u8 (uint8x16_t t); +#define vreinterpretq_s32_u8 + +int32x4_t vreinterpretq_s32_s16 (int16x8_t t); +#define vreinterpretq_s32_s16 + +int32x4_t vreinterpretq_s32_s8 (int8x16_t t); +#define vreinterpretq_s32_s8 + +int32x4_t vreinterpretq_s32_u64 (uint64x2_t t); +#define vreinterpretq_s32_u64 + +int32x4_t vreinterpretq_s32_s64 (int64x2_t t); +#define vreinterpretq_s32_s64 + +int32x4_t vreinterpretq_s32_f32 (float32x4_t t); +#define vreinterpretq_s32_f32(t) _mm_castps_si128(t) //(*(__m128i*)&(t)) + +int32x4_t vreinterpretq_s32_p16 (poly16x8_t t); +#define vreinterpretq_s32_p16 + +int32x4_t vreinterpretq_s32_p8 (poly8x16_t t); +#define vreinterpretq_s32_p8 + +uint8x8_t vreinterpret_u8_u32 (uint32x2_t t); +#define vreinterpret_u8_u32 + +uint8x8_t vreinterpret_u8_u16 (uint16x4_t t); +#define vreinterpret_u8_u16 + +uint8x8_t vreinterpret_u8_s32 (int32x2_t t); +#define vreinterpret_u8_s32 + +uint8x8_t vreinterpret_u8_s16 (int16x4_t t); +#define vreinterpret_u8_s16 + +uint8x8_t vreinterpret_u8_s8 (int8x8_t t); +#define vreinterpret_u8_s8 + +uint8x8_t vreinterpret_u8_u64 (uint64x1_t t); +#define vreinterpret_u8_u64 + +uint8x8_t vreinterpret_u8_s64 (int64x1_t t); +#define vreinterpret_u8_s64 + +uint8x8_t vreinterpret_u8_f32 (float32x2_t t); +#define vreinterpret_u8_f32 + +uint8x8_t vreinterpret_u8_p16 (poly16x4_t t); +#define vreinterpret_u8_p16 + +uint8x8_t vreinterpret_u8_p8 (poly8x8_t t); +#define vreinterpret_u8_p8 + +uint8x16_t vreinterpretq_u8_u32 (uint32x4_t t); +#define vreinterpretq_u8_u32 + +uint8x16_t vreinterpretq_u8_u16 (uint16x8_t t); +#define vreinterpretq_u8_u16 + +uint8x16_t vreinterpretq_u8_s32 (int32x4_t t); +#define vreinterpretq_u8_s32 + +uint8x16_t vreinterpretq_u8_s16 (int16x8_t t); +#define vreinterpretq_u8_s16 + +uint8x16_t vreinterpretq_u8_s8 (int8x16_t t); +#define vreinterpretq_u8_s8 + +uint8x16_t vreinterpretq_u8_u64 (uint64x2_t t); +#define vreinterpretq_u8_u64 + +uint8x16_t vreinterpretq_u8_s64 (int64x2_t t); +#define vreinterpretq_u8_s64 + +uint8x16_t vreinterpretq_u8_f32 (float32x4_t t); +#define vreinterpretq_u8_f32(t) _M128i(t) + + +uint8x16_t vreinterpretq_u8_p16 (poly16x8_t t); +#define vreinterpretq_u8_p16 + +uint8x16_t vreinterpretq_u8_p8 (poly8x16_t t); +#define vreinterpretq_u8_p8 + +uint16x4_t vreinterpret_u16_u32 (uint32x2_t t); +#define vreinterpret_u16_u32 + +uint16x4_t vreinterpret_u16_u8 (uint8x8_t t); +#define vreinterpret_u16_u8 + +uint16x4_t vreinterpret_u16_s32 (int32x2_t t); +#define vreinterpret_u16_s32 + +uint16x4_t vreinterpret_u16_s16 (int16x4_t t); +#define vreinterpret_u16_s16 + +uint16x4_t vreinterpret_u16_s8 (int8x8_t t); +#define vreinterpret_u16_s8 + +uint16x4_t vreinterpret_u16_u64 (uint64x1_t t); +#define vreinterpret_u16_u64 + +uint16x4_t vreinterpret_u16_s64 (int64x1_t t); +#define vreinterpret_u16_s64 + +uint16x4_t vreinterpret_u16_f32 (float32x2_t t); +#define vreinterpret_u16_f32 + +uint16x4_t vreinterpret_u16_p16 (poly16x4_t t); +#define vreinterpret_u16_p16 + +uint16x4_t vreinterpret_u16_p8 (poly8x8_t t); +#define vreinterpret_u16_p8 + +uint16x8_t vreinterpretq_u16_u32 (uint32x4_t t); +#define vreinterpretq_u16_u32 + +uint16x8_t vreinterpretq_u16_u8 (uint8x16_t t); +#define vreinterpretq_u16_u8 + +uint16x8_t vreinterpretq_u16_s32 (int32x4_t t); +#define vreinterpretq_u16_s32 + +uint16x8_t vreinterpretq_u16_s16 (int16x8_t t); +#define vreinterpretq_u16_s16 + +uint16x8_t vreinterpretq_u16_s8 (int8x16_t t); +#define vreinterpretq_u16_s8 + +uint16x8_t vreinterpretq_u16_u64 (uint64x2_t t); +#define vreinterpretq_u16_u64 + +uint16x8_t vreinterpretq_u16_s64 (int64x2_t t); +#define vreinterpretq_u16_s64 + +uint16x8_t vreinterpretq_u16_f32 (float32x4_t t); +#define vreinterpretq_u16_f32(t) _M128i(t) + +uint16x8_t vreinterpretq_u16_p16 (poly16x8_t t); +#define vreinterpretq_u16_p16 + +uint16x8_t vreinterpretq_u16_p8 (poly8x16_t t); +#define vreinterpretq_u16_p8 + +uint32x2_t vreinterpret_u32_u16 (uint16x4_t t); +#define vreinterpret_u32_u16 + +uint32x2_t vreinterpret_u32_u8 (uint8x8_t t); +#define vreinterpret_u32_u8 + +uint32x2_t vreinterpret_u32_s32 (int32x2_t t); +#define vreinterpret_u32_s32 + +uint32x2_t vreinterpret_u32_s16 (int16x4_t t); +#define vreinterpret_u32_s16 + +uint32x2_t vreinterpret_u32_s8 (int8x8_t t); +#define vreinterpret_u32_s8 + +uint32x2_t vreinterpret_u32_u64 (uint64x1_t t); +#define vreinterpret_u32_u64 + +uint32x2_t vreinterpret_u32_s64 (int64x1_t t); +#define vreinterpret_u32_s64 + +uint32x2_t vreinterpret_u32_f32 (float32x2_t t); +#define vreinterpret_u32_f32 + +uint32x2_t vreinterpret_u32_p16 (poly16x4_t t); +#define vreinterpret_u32_p16 + +uint32x2_t vreinterpret_u32_p8 (poly8x8_t t); +#define vreinterpret_u32_p8 + +uint32x4_t vreinterpretq_u32_u16 (uint16x8_t t); +#define vreinterpretq_u32_u16 + +uint32x4_t vreinterpretq_u32_u8 (uint8x16_t t); +#define vreinterpretq_u32_u8 + +uint32x4_t vreinterpretq_u32_s32 (int32x4_t t); +#define vreinterpretq_u32_s32 + +uint32x4_t vreinterpretq_u32_s16 (int16x8_t t); +#define vreinterpretq_u32_s16 + +uint32x4_t vreinterpretq_u32_s8 (int8x16_t t); +#define vreinterpretq_u32_s8 + +uint32x4_t vreinterpretq_u32_u64 (uint64x2_t t); +#define vreinterpretq_u32_u64 + +uint32x4_t vreinterpretq_u32_s64 (int64x2_t t); +#define vreinterpretq_u32_s64 + +uint32x4_t vreinterpretq_u32_f32 (float32x4_t t); +#define vreinterpretq_u32_f32(t) _M128i(t) + +uint32x4_t vreinterpretq_u32_p16 (poly16x8_t t); +#define vreinterpretq_u32_p16 + +uint32x4_t vreinterpretq_u32_p8 (poly8x16_t t); +#define vreinterpretq_u32_p8 + +#endif /* NEON2SSE_H */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/avx2intrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/avx2intrin.h new file mode 100644 index 0000000..d04c972 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/avx2intrin.h @@ -0,0 +1,1889 @@ +/* Copyright (C) 2011-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _IMMINTRIN_H_INCLUDED +# error "Never use directly; include instead." +#endif + +#ifndef _AVX2INTRIN_H_INCLUDED +#define _AVX2INTRIN_H_INCLUDED + +#ifndef __AVX2__ +#pragma GCC push_options +#pragma GCC target("avx2") +#define __DISABLE_AVX2__ +#endif /* __AVX2__ */ + +/* Sum absolute 8-bit integer difference of adjacent groups of 4 + byte integers in the first 2 operands. Starting offsets within + operands are determined by the 3rd mask operand. */ +#ifdef __OPTIMIZE__ +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mpsadbw_epu8 (__m256i __X, __m256i __Y, const int __M) +{ + return (__m256i) __builtin_ia32_mpsadbw256 ((__v32qi)__X, + (__v32qi)__Y, __M); +} +#else +#define _mm256_mpsadbw_epu8(X, Y, M) \ + ((__m256i) __builtin_ia32_mpsadbw256 ((__v32qi)(__m256i)(X), \ + (__v32qi)(__m256i)(Y), (int)(M))) +#endif + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_abs_epi8 (__m256i __A) +{ + return (__m256i)__builtin_ia32_pabsb256 ((__v32qi)__A); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_abs_epi16 (__m256i __A) +{ + return (__m256i)__builtin_ia32_pabsw256 ((__v16hi)__A); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_abs_epi32 (__m256i __A) +{ + return (__m256i)__builtin_ia32_pabsd256 ((__v8si)__A); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_packs_epi32 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_packssdw256 ((__v8si)__A, (__v8si)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_packs_epi16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_packsswb256 ((__v16hi)__A, (__v16hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_packus_epi32 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_packusdw256 ((__v8si)__A, (__v8si)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_packus_epi16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_packuswb256 ((__v16hi)__A, (__v16hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_add_epi8 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_paddb256 ((__v32qi)__A, (__v32qi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_add_epi16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_paddw256 ((__v16hi)__A, (__v16hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_add_epi32 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_paddd256 ((__v8si)__A, (__v8si)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_add_epi64 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_paddq256 ((__v4di)__A, (__v4di)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_adds_epi8 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_paddsb256 ((__v32qi)__A, (__v32qi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_adds_epi16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_paddsw256 ((__v16hi)__A, (__v16hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_adds_epu8 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_paddusb256 ((__v32qi)__A, (__v32qi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_adds_epu16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_paddusw256 ((__v16hi)__A, (__v16hi)__B); +} + +#ifdef __OPTIMIZE__ +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_alignr_epi8 (__m256i __A, __m256i __B, const int __N) +{ + return (__m256i) __builtin_ia32_palignr256 ((__v4di)__A, + (__v4di)__B, + __N * 8); +} +#else +/* In that case (__N*8) will be in vreg, and insn will not be matched. */ +/* Use define instead */ +#define _mm256_alignr_epi8(A, B, N) \ + ((__m256i) __builtin_ia32_palignr256 ((__v4di)(__m256i)(A), \ + (__v4di)(__m256i)(B), \ + (int)(N) * 8)) +#endif + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_and_si256 (__m256i __A, __m256i __B) +{ + return (__m256i) __builtin_ia32_andsi256 ((__v4di)__A, (__v4di)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_andnot_si256 (__m256i __A, __m256i __B) +{ + return (__m256i) __builtin_ia32_andnotsi256 ((__v4di)__A, (__v4di)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_avg_epu8 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pavgb256 ((__v32qi)__A, (__v32qi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_avg_epu16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pavgw256 ((__v16hi)__A, (__v16hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_blendv_epi8 (__m256i __X, __m256i __Y, __m256i __M) +{ + return (__m256i) __builtin_ia32_pblendvb256 ((__v32qi)__X, + (__v32qi)__Y, + (__v32qi)__M); +} + +#ifdef __OPTIMIZE__ +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_blend_epi16 (__m256i __X, __m256i __Y, const int __M) +{ + return (__m256i) __builtin_ia32_pblendw256 ((__v16hi)__X, + (__v16hi)__Y, + __M); +} +#else +#define _mm256_blend_epi16(X, Y, M) \ + ((__m256i) __builtin_ia32_pblendw256 ((__v16hi)(__m256i)(X), \ + (__v16hi)(__m256i)(Y), (int)(M))) +#endif + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cmpeq_epi8 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pcmpeqb256 ((__v32qi)__A, (__v32qi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cmpeq_epi16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pcmpeqw256 ((__v16hi)__A, (__v16hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cmpeq_epi32 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pcmpeqd256 ((__v8si)__A, (__v8si)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cmpeq_epi64 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pcmpeqq256 ((__v4di)__A, (__v4di)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cmpgt_epi8 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pcmpgtb256 ((__v32qi)__A, + (__v32qi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cmpgt_epi16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pcmpgtw256 ((__v16hi)__A, + (__v16hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cmpgt_epi32 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pcmpgtd256 ((__v8si)__A, + (__v8si)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cmpgt_epi64 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pcmpgtq256 ((__v4di)__A, (__v4di)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_hadd_epi16 (__m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_phaddw256 ((__v16hi)__X, + (__v16hi)__Y); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_hadd_epi32 (__m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_phaddd256 ((__v8si)__X, (__v8si)__Y); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_hadds_epi16 (__m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_phaddsw256 ((__v16hi)__X, + (__v16hi)__Y); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_hsub_epi16 (__m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_phsubw256 ((__v16hi)__X, + (__v16hi)__Y); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_hsub_epi32 (__m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_phsubd256 ((__v8si)__X, (__v8si)__Y); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_hsubs_epi16 (__m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_phsubsw256 ((__v16hi)__X, + (__v16hi)__Y); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maddubs_epi16 (__m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_pmaddubsw256 ((__v32qi)__X, + (__v32qi)__Y); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_madd_epi16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pmaddwd256 ((__v16hi)__A, + (__v16hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_max_epi8 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pmaxsb256 ((__v32qi)__A, (__v32qi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_max_epi16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pmaxsw256 ((__v16hi)__A, (__v16hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_max_epi32 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pmaxsd256 ((__v8si)__A, (__v8si)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_max_epu8 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pmaxub256 ((__v32qi)__A, (__v32qi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_max_epu16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pmaxuw256 ((__v16hi)__A, (__v16hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_max_epu32 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pmaxud256 ((__v8si)__A, (__v8si)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_min_epi8 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pminsb256 ((__v32qi)__A, (__v32qi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_min_epi16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pminsw256 ((__v16hi)__A, (__v16hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_min_epi32 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pminsd256 ((__v8si)__A, (__v8si)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_min_epu8 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pminub256 ((__v32qi)__A, (__v32qi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_min_epu16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pminuw256 ((__v16hi)__A, (__v16hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_min_epu32 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pminud256 ((__v8si)__A, (__v8si)__B); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_movemask_epi8 (__m256i __A) +{ + return __builtin_ia32_pmovmskb256 ((__v32qi)__A); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtepi8_epi16 (__m128i __X) +{ + return (__m256i) __builtin_ia32_pmovsxbw256 ((__v16qi)__X); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtepi8_epi32 (__m128i __X) +{ + return (__m256i) __builtin_ia32_pmovsxbd256 ((__v16qi)__X); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtepi8_epi64 (__m128i __X) +{ + return (__m256i) __builtin_ia32_pmovsxbq256 ((__v16qi)__X); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtepi16_epi32 (__m128i __X) +{ + return (__m256i) __builtin_ia32_pmovsxwd256 ((__v8hi)__X); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtepi16_epi64 (__m128i __X) +{ + return (__m256i) __builtin_ia32_pmovsxwq256 ((__v8hi)__X); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtepi32_epi64 (__m128i __X) +{ + return (__m256i) __builtin_ia32_pmovsxdq256 ((__v4si)__X); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtepu8_epi16 (__m128i __X) +{ + return (__m256i) __builtin_ia32_pmovzxbw256 ((__v16qi)__X); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtepu8_epi32 (__m128i __X) +{ + return (__m256i) __builtin_ia32_pmovzxbd256 ((__v16qi)__X); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtepu8_epi64 (__m128i __X) +{ + return (__m256i) __builtin_ia32_pmovzxbq256 ((__v16qi)__X); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtepu16_epi32 (__m128i __X) +{ + return (__m256i) __builtin_ia32_pmovzxwd256 ((__v8hi)__X); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtepu16_epi64 (__m128i __X) +{ + return (__m256i) __builtin_ia32_pmovzxwq256 ((__v8hi)__X); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtepu32_epi64 (__m128i __X) +{ + return (__m256i) __builtin_ia32_pmovzxdq256 ((__v4si)__X); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mul_epi32 (__m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_pmuldq256 ((__v8si)__X, (__v8si)__Y); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mulhrs_epi16 (__m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_pmulhrsw256 ((__v16hi)__X, + (__v16hi)__Y); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mulhi_epu16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pmulhuw256 ((__v16hi)__A, (__v16hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mulhi_epi16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pmulhw256 ((__v16hi)__A, (__v16hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mullo_epi16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pmullw256 ((__v16hi)__A, (__v16hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mullo_epi32 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pmulld256 ((__v8si)__A, (__v8si)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mul_epu32 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pmuludq256 ((__v8si)__A, (__v8si)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_or_si256 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_por256 ((__v4di)__A, (__v4di)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sad_epu8 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_psadbw256 ((__v32qi)__A, (__v32qi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_shuffle_epi8 (__m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_pshufb256 ((__v32qi)__X, + (__v32qi)__Y); +} + +#ifdef __OPTIMIZE__ +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_shuffle_epi32 (__m256i __A, const int __mask) +{ + return (__m256i)__builtin_ia32_pshufd256 ((__v8si)__A, __mask); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_shufflehi_epi16 (__m256i __A, const int __mask) +{ + return (__m256i)__builtin_ia32_pshufhw256 ((__v16hi)__A, __mask); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_shufflelo_epi16 (__m256i __A, const int __mask) +{ + return (__m256i)__builtin_ia32_pshuflw256 ((__v16hi)__A, __mask); +} +#else +#define _mm256_shuffle_epi32(A, N) \ + ((__m256i)__builtin_ia32_pshufd256 ((__v8si)(__m256i)(A), (int)(N))) +#define _mm256_shufflehi_epi16(A, N) \ + ((__m256i)__builtin_ia32_pshufhw256 ((__v16hi)(__m256i)(A), (int)(N))) +#define _mm256_shufflelo_epi16(A, N) \ + ((__m256i)__builtin_ia32_pshuflw256 ((__v16hi)(__m256i)(A), (int)(N))) +#endif + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sign_epi8 (__m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_psignb256 ((__v32qi)__X, (__v32qi)__Y); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sign_epi16 (__m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_psignw256 ((__v16hi)__X, (__v16hi)__Y); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sign_epi32 (__m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_psignd256 ((__v8si)__X, (__v8si)__Y); +} + +#ifdef __OPTIMIZE__ +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_slli_si256 (__m256i __A, const int __N) +{ + return (__m256i)__builtin_ia32_pslldqi256 (__A, __N * 8); +} +#else +#define _mm256_slli_si256(A, N) \ + ((__m256i)__builtin_ia32_pslldqi256 ((__m256i)(A), (int)(N) * 8)) +#endif + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_slli_epi16 (__m256i __A, int __B) +{ + return (__m256i)__builtin_ia32_psllwi256 ((__v16hi)__A, __B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sll_epi16 (__m256i __A, __m128i __B) +{ + return (__m256i)__builtin_ia32_psllw256((__v16hi)__A, (__v8hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_slli_epi32 (__m256i __A, int __B) +{ + return (__m256i)__builtin_ia32_pslldi256 ((__v8si)__A, __B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sll_epi32 (__m256i __A, __m128i __B) +{ + return (__m256i)__builtin_ia32_pslld256((__v8si)__A, (__v4si)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_slli_epi64 (__m256i __A, int __B) +{ + return (__m256i)__builtin_ia32_psllqi256 ((__v4di)__A, __B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sll_epi64 (__m256i __A, __m128i __B) +{ + return (__m256i)__builtin_ia32_psllq256((__v4di)__A, (__v2di)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_srai_epi16 (__m256i __A, int __B) +{ + return (__m256i)__builtin_ia32_psrawi256 ((__v16hi)__A, __B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sra_epi16 (__m256i __A, __m128i __B) +{ + return (__m256i)__builtin_ia32_psraw256 ((__v16hi)__A, (__v8hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_srai_epi32 (__m256i __A, int __B) +{ + return (__m256i)__builtin_ia32_psradi256 ((__v8si)__A, __B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sra_epi32 (__m256i __A, __m128i __B) +{ + return (__m256i)__builtin_ia32_psrad256 ((__v8si)__A, (__v4si)__B); +} + +#ifdef __OPTIMIZE__ +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_srli_si256 (__m256i __A, const int __N) +{ + return (__m256i)__builtin_ia32_psrldqi256 (__A, __N * 8); +} +#else +#define _mm256_srli_si256(A, N) \ + ((__m256i)__builtin_ia32_psrldqi256 ((__m256i)(A), (int)(N) * 8)) +#endif + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_srli_epi16 (__m256i __A, int __B) +{ + return (__m256i)__builtin_ia32_psrlwi256 ((__v16hi)__A, __B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_srl_epi16 (__m256i __A, __m128i __B) +{ + return (__m256i)__builtin_ia32_psrlw256((__v16hi)__A, (__v8hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_srli_epi32 (__m256i __A, int __B) +{ + return (__m256i)__builtin_ia32_psrldi256 ((__v8si)__A, __B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_srl_epi32 (__m256i __A, __m128i __B) +{ + return (__m256i)__builtin_ia32_psrld256((__v8si)__A, (__v4si)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_srli_epi64 (__m256i __A, int __B) +{ + return (__m256i)__builtin_ia32_psrlqi256 ((__v4di)__A, __B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_srl_epi64 (__m256i __A, __m128i __B) +{ + return (__m256i)__builtin_ia32_psrlq256((__v4di)__A, (__v2di)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sub_epi8 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_psubb256 ((__v32qi)__A, (__v32qi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sub_epi16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_psubw256 ((__v16hi)__A, (__v16hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sub_epi32 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_psubd256 ((__v8si)__A, (__v8si)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sub_epi64 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_psubq256 ((__v4di)__A, (__v4di)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_subs_epi8 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_psubsb256 ((__v32qi)__A, (__v32qi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_subs_epi16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_psubsw256 ((__v16hi)__A, (__v16hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_subs_epu8 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_psubusb256 ((__v32qi)__A, (__v32qi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_subs_epu16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_psubusw256 ((__v16hi)__A, (__v16hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_unpackhi_epi8 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_punpckhbw256 ((__v32qi)__A, (__v32qi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_unpackhi_epi16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_punpckhwd256 ((__v16hi)__A, (__v16hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_unpackhi_epi32 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_punpckhdq256 ((__v8si)__A, (__v8si)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_unpackhi_epi64 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_punpckhqdq256 ((__v4di)__A, (__v4di)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_unpacklo_epi8 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_punpcklbw256 ((__v32qi)__A, (__v32qi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_unpacklo_epi16 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_punpcklwd256 ((__v16hi)__A, (__v16hi)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_unpacklo_epi32 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_punpckldq256 ((__v8si)__A, (__v8si)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_unpacklo_epi64 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_punpcklqdq256 ((__v4di)__A, (__v4di)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_xor_si256 (__m256i __A, __m256i __B) +{ + return (__m256i)__builtin_ia32_pxor256 ((__v4di)__A, (__v4di)__B); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_stream_load_si256 (__m256i const *__X) +{ + return (__m256i) __builtin_ia32_movntdqa256 ((__v4di *) __X); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_broadcastss_ps (__m128 __X) +{ + return (__m128) __builtin_ia32_vbroadcastss_ps ((__v4sf)__X); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_broadcastss_ps (__m128 __X) +{ + return (__m256) __builtin_ia32_vbroadcastss_ps256 ((__v4sf)__X); +} + +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_broadcastsd_pd (__m128d __X) +{ + return (__m256d) __builtin_ia32_vbroadcastsd_pd256 ((__v2df)__X); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_broadcastsi128_si256 (__m128i __X) +{ + return (__m256i) __builtin_ia32_vbroadcastsi256 ((__v2di)__X); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_blend_epi32 (__m128i __X, __m128i __Y, const int __M) +{ + return (__m128i) __builtin_ia32_pblendd128 ((__v4si)__X, + (__v4si)__Y, + __M); +} +#else +#define _mm_blend_epi32(X, Y, M) \ + ((__m128i) __builtin_ia32_pblendd128 ((__v4si)(__m128i)(X), \ + (__v4si)(__m128i)(Y), (int)(M))) +#endif + +#ifdef __OPTIMIZE__ +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_blend_epi32 (__m256i __X, __m256i __Y, const int __M) +{ + return (__m256i) __builtin_ia32_pblendd256 ((__v8si)__X, + (__v8si)__Y, + __M); +} +#else +#define _mm256_blend_epi32(X, Y, M) \ + ((__m256i) __builtin_ia32_pblendd256 ((__v8si)(__m256i)(X), \ + (__v8si)(__m256i)(Y), (int)(M))) +#endif + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_broadcastb_epi8 (__m128i __X) +{ + return (__m256i) __builtin_ia32_pbroadcastb256 ((__v16qi)__X); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_broadcastw_epi16 (__m128i __X) +{ + return (__m256i) __builtin_ia32_pbroadcastw256 ((__v8hi)__X); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_broadcastd_epi32 (__m128i __X) +{ + return (__m256i) __builtin_ia32_pbroadcastd256 ((__v4si)__X); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_broadcastq_epi64 (__m128i __X) +{ + return (__m256i) __builtin_ia32_pbroadcastq256 ((__v2di)__X); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_broadcastb_epi8 (__m128i __X) +{ + return (__m128i) __builtin_ia32_pbroadcastb128 ((__v16qi)__X); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_broadcastw_epi16 (__m128i __X) +{ + return (__m128i) __builtin_ia32_pbroadcastw128 ((__v8hi)__X); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_broadcastd_epi32 (__m128i __X) +{ + return (__m128i) __builtin_ia32_pbroadcastd128 ((__v4si)__X); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_broadcastq_epi64 (__m128i __X) +{ + return (__m128i) __builtin_ia32_pbroadcastq128 ((__v2di)__X); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permutevar8x32_epi32 (__m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_permvarsi256 ((__v8si)__X, (__v8si)__Y); +} + +#ifdef __OPTIMIZE__ +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permute4x64_pd (__m256d __X, const int __M) +{ + return (__m256d) __builtin_ia32_permdf256 ((__v4df)__X, __M); +} +#else +#define _mm256_permute4x64_pd(X, M) \ + ((__m256d) __builtin_ia32_permdf256 ((__v4df)(__m256d)(X), (int)(M))) +#endif + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permutevar8x32_ps (__m256 __X, __m256i __Y) +{ + return (__m256) __builtin_ia32_permvarsf256 ((__v8sf)__X, (__v8si)__Y); +} + +#ifdef __OPTIMIZE__ +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permute4x64_epi64 (__m256i __X, const int __M) +{ + return (__m256i) __builtin_ia32_permdi256 ((__v4di)__X, __M); +} +#else +#define _mm256_permute4x64_epi64(X, M) \ + ((__m256i) __builtin_ia32_permdi256 ((__v4di)(__m256i)(X), (int)(M))) +#endif + + +#ifdef __OPTIMIZE__ +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permute2x128_si256 (__m256i __X, __m256i __Y, const int __M) +{ + return (__m256i) __builtin_ia32_permti256 ((__v4di)__X, (__v4di)__Y, __M); +} +#else +#define _mm256_permute2x128_si256(X, Y, M) \ + ((__m256i) __builtin_ia32_permti256 ((__v4di)(__m256i)(X), (__v4di)(__m256i)(Y), (int)(M))) +#endif + +#ifdef __OPTIMIZE__ +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_extracti128_si256 (__m256i __X, const int __M) +{ + return (__m128i) __builtin_ia32_extract128i256 ((__v4di)__X, __M); +} +#else +#define _mm256_extracti128_si256(X, M) \ + ((__m128i) __builtin_ia32_extract128i256 ((__v4di)(__m256i)(X), (int)(M))) +#endif + +#ifdef __OPTIMIZE__ +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_inserti128_si256 (__m256i __X, __m128i __Y, const int __M) +{ + return (__m256i) __builtin_ia32_insert128i256 ((__v4di)__X, (__v2di)__Y, __M); +} +#else +#define _mm256_inserti128_si256(X, Y, M) \ + ((__m256i) __builtin_ia32_insert128i256 ((__v4di)(__m256i)(X), \ + (__v2di)(__m128i)(Y), \ + (int)(M))) +#endif + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskload_epi32 (int const *__X, __m256i __M ) +{ + return (__m256i) __builtin_ia32_maskloadd256 ((const __v8si *)__X, + (__v8si)__M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskload_epi64 (long long const *__X, __m256i __M ) +{ + return (__m256i) __builtin_ia32_maskloadq256 ((const __v4di *)__X, + (__v4di)__M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskload_epi32 (int const *__X, __m128i __M ) +{ + return (__m128i) __builtin_ia32_maskloadd ((const __v4si *)__X, + (__v4si)__M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskload_epi64 (long long const *__X, __m128i __M ) +{ + return (__m128i) __builtin_ia32_maskloadq ((const __v2di *)__X, + (__v2di)__M); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskstore_epi32 (int *__X, __m256i __M, __m256i __Y ) +{ + __builtin_ia32_maskstored256 ((__v8si *)__X, (__v8si)__M, (__v8si)__Y); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskstore_epi64 (long long *__X, __m256i __M, __m256i __Y ) +{ + __builtin_ia32_maskstoreq256 ((__v4di *)__X, (__v4di)__M, (__v4di)__Y); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskstore_epi32 (int *__X, __m128i __M, __m128i __Y ) +{ + __builtin_ia32_maskstored ((__v4si *)__X, (__v4si)__M, (__v4si)__Y); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskstore_epi64 (long long *__X, __m128i __M, __m128i __Y ) +{ + __builtin_ia32_maskstoreq (( __v2di *)__X, (__v2di)__M, (__v2di)__Y); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sllv_epi32 (__m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_psllv8si ((__v8si)__X, (__v8si)__Y); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sllv_epi32 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_psllv4si ((__v4si)__X, (__v4si)__Y); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sllv_epi64 (__m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_psllv4di ((__v4di)__X, (__v4di)__Y); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sllv_epi64 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_psllv2di ((__v2di)__X, (__v2di)__Y); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_srav_epi32 (__m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_psrav8si ((__v8si)__X, (__v8si)__Y); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srav_epi32 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_psrav4si ((__v4si)__X, (__v4si)__Y); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_srlv_epi32 (__m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_psrlv8si ((__v8si)__X, (__v8si)__Y); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srlv_epi32 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_psrlv4si ((__v4si)__X, (__v4si)__Y); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_srlv_epi64 (__m256i __X, __m256i __Y) +{ + return (__m256i) __builtin_ia32_psrlv4di ((__v4di)__X, (__v4di)__Y); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srlv_epi64 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_psrlv2di ((__v2di)__X, (__v2di)__Y); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_i32gather_pd (double const *base, __m128i index, const int scale) +{ + __v2df zero = _mm_setzero_pd (); + __v2df mask = _mm_cmpeq_pd (zero, zero); + + return (__m128d) __builtin_ia32_gathersiv2df (_mm_undefined_pd (), + base, + (__v4si)index, + mask, + scale); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_i32gather_pd (__m128d src, double const *base, __m128i index, + __m128d mask, const int scale) +{ + return (__m128d) __builtin_ia32_gathersiv2df ((__v2df)src, + base, + (__v4si)index, + (__v2df)mask, + scale); +} + +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_i32gather_pd (double const *base, __m128i index, const int scale) +{ + __v4df zero = _mm256_setzero_pd (); + __v4df mask = _mm256_cmp_pd (zero, zero, _CMP_EQ_OQ); + + return (__m256d) __builtin_ia32_gathersiv4df (_mm256_undefined_pd (), + base, + (__v4si)index, + mask, + scale); +} + +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_i32gather_pd (__m256d src, double const *base, + __m128i index, __m256d mask, const int scale) +{ + return (__m256d) __builtin_ia32_gathersiv4df ((__v4df)src, + base, + (__v4si)index, + (__v4df)mask, + scale); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_i64gather_pd (double const *base, __m128i index, const int scale) +{ + __v2df src = _mm_setzero_pd (); + __v2df mask = _mm_cmpeq_pd (src, src); + + return (__m128d) __builtin_ia32_gatherdiv2df (src, + base, + (__v2di)index, + mask, + scale); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_i64gather_pd (__m128d src, double const *base, __m128i index, + __m128d mask, const int scale) +{ + return (__m128d) __builtin_ia32_gatherdiv2df ((__v2df)src, + base, + (__v2di)index, + (__v2df)mask, + scale); +} + +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_i64gather_pd (double const *base, __m256i index, const int scale) +{ + __v4df src = _mm256_setzero_pd (); + __v4df mask = _mm256_cmp_pd (src, src, _CMP_EQ_OQ); + + return (__m256d) __builtin_ia32_gatherdiv4df (src, + base, + (__v4di)index, + mask, + scale); +} + +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_i64gather_pd (__m256d src, double const *base, + __m256i index, __m256d mask, const int scale) +{ + return (__m256d) __builtin_ia32_gatherdiv4df ((__v4df)src, + base, + (__v4di)index, + (__v4df)mask, + scale); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_i32gather_ps (float const *base, __m128i index, const int scale) +{ + __v4sf src = _mm_setzero_ps (); + __v4sf mask = _mm_cmpeq_ps (src, src); + + return (__m128) __builtin_ia32_gathersiv4sf (src, + base, + (__v4si)index, + mask, + scale); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_i32gather_ps (__m128 src, float const *base, __m128i index, + __m128 mask, const int scale) +{ + return (__m128) __builtin_ia32_gathersiv4sf ((__v4sf)src, + base, + (__v4si)index, + (__v4sf)mask, + scale); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_i32gather_ps (float const *base, __m256i index, const int scale) +{ + __v8sf src = _mm256_setzero_ps (); + __v8sf mask = _mm256_cmp_ps (src, src, _CMP_EQ_OQ); + + return (__m256) __builtin_ia32_gathersiv8sf (src, + base, + (__v8si)index, + mask, + scale); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_i32gather_ps (__m256 src, float const *base, + __m256i index, __m256 mask, const int scale) +{ + return (__m256) __builtin_ia32_gathersiv8sf ((__v8sf)src, + base, + (__v8si)index, + (__v8sf)mask, + scale); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_i64gather_ps (float const *base, __m128i index, const int scale) +{ + __v4sf src = _mm_setzero_ps (); + __v4sf mask = _mm_cmpeq_ps (src, src); + + return (__m128) __builtin_ia32_gatherdiv4sf (src, + base, + (__v2di)index, + mask, + scale); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_i64gather_ps (__m128 src, float const *base, __m128i index, + __m128 mask, const int scale) +{ + return (__m128) __builtin_ia32_gatherdiv4sf ((__v4sf)src, + base, + (__v2di)index, + (__v4sf)mask, + scale); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_i64gather_ps (float const *base, __m256i index, const int scale) +{ + __v4sf src = _mm_setzero_ps (); + __v4sf mask = _mm_cmpeq_ps (src, src); + + return (__m128) __builtin_ia32_gatherdiv4sf256 (src, + base, + (__v4di)index, + mask, + scale); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_i64gather_ps (__m128 src, float const *base, + __m256i index, __m128 mask, const int scale) +{ + return (__m128) __builtin_ia32_gatherdiv4sf256 ((__v4sf)src, + base, + (__v4di)index, + (__v4sf)mask, + scale); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_i32gather_epi64 (long long int const *base, + __m128i index, const int scale) +{ + __v2di src = __extension__ (__v2di){ 0, 0 }; + __v2di mask = __extension__ (__v2di){ ~0, ~0 }; + + return (__m128i) __builtin_ia32_gathersiv2di (src, + base, + (__v4si)index, + mask, + scale); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_i32gather_epi64 (__m128i src, long long int const *base, + __m128i index, __m128i mask, const int scale) +{ + return (__m128i) __builtin_ia32_gathersiv2di ((__v2di)src, + base, + (__v4si)index, + (__v2di)mask, + scale); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_i32gather_epi64 (long long int const *base, + __m128i index, const int scale) +{ + __v4di src = __extension__ (__v4di){ 0, 0, 0, 0 }; + __v4di mask = __extension__ (__v4di){ ~0, ~0, ~0, ~0 }; + + return (__m256i) __builtin_ia32_gathersiv4di (src, + base, + (__v4si)index, + mask, + scale); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_i32gather_epi64 (__m256i src, long long int const *base, + __m128i index, __m256i mask, const int scale) +{ + return (__m256i) __builtin_ia32_gathersiv4di ((__v4di)src, + base, + (__v4si)index, + (__v4di)mask, + scale); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_i64gather_epi64 (long long int const *base, + __m128i index, const int scale) +{ + __v2di src = __extension__ (__v2di){ 0, 0 }; + __v2di mask = __extension__ (__v2di){ ~0, ~0 }; + + return (__m128i) __builtin_ia32_gatherdiv2di (src, + base, + (__v2di)index, + mask, + scale); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_i64gather_epi64 (__m128i src, long long int const *base, __m128i index, + __m128i mask, const int scale) +{ + return (__m128i) __builtin_ia32_gatherdiv2di ((__v2di)src, + base, + (__v2di)index, + (__v2di)mask, + scale); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_i64gather_epi64 (long long int const *base, + __m256i index, const int scale) +{ + __v4di src = __extension__ (__v4di){ 0, 0, 0, 0 }; + __v4di mask = __extension__ (__v4di){ ~0, ~0, ~0, ~0 }; + + return (__m256i) __builtin_ia32_gatherdiv4di (src, + base, + (__v4di)index, + mask, + scale); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_i64gather_epi64 (__m256i src, long long int const *base, + __m256i index, __m256i mask, const int scale) +{ + return (__m256i) __builtin_ia32_gatherdiv4di ((__v4di)src, + base, + (__v4di)index, + (__v4di)mask, + scale); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_i32gather_epi32 (int const *base, __m128i index, const int scale) +{ + __v4si src = __extension__ (__v4si){ 0, 0, 0, 0 }; + __v4si mask = __extension__ (__v4si){ ~0, ~0, ~0, ~0 }; + + return (__m128i) __builtin_ia32_gathersiv4si (src, + base, + (__v4si)index, + mask, + scale); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_i32gather_epi32 (__m128i src, int const *base, __m128i index, + __m128i mask, const int scale) +{ + return (__m128i) __builtin_ia32_gathersiv4si ((__v4si)src, + base, + (__v4si)index, + (__v4si)mask, + scale); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_i32gather_epi32 (int const *base, __m256i index, const int scale) +{ + __v8si src = __extension__ (__v8si){ 0, 0, 0, 0, 0, 0, 0, 0 }; + __v8si mask = __extension__ (__v8si){ ~0, ~0, ~0, ~0, ~0, ~0, ~0, ~0 }; + + return (__m256i) __builtin_ia32_gathersiv8si (src, + base, + (__v8si)index, + mask, + scale); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_i32gather_epi32 (__m256i src, int const *base, + __m256i index, __m256i mask, const int scale) +{ + return (__m256i) __builtin_ia32_gathersiv8si ((__v8si)src, + base, + (__v8si)index, + (__v8si)mask, + scale); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_i64gather_epi32 (int const *base, __m128i index, const int scale) +{ + __v4si src = __extension__ (__v4si){ 0, 0, 0, 0 }; + __v4si mask = __extension__ (__v4si){ ~0, ~0, ~0, ~0 }; + + return (__m128i) __builtin_ia32_gatherdiv4si (src, + base, + (__v2di)index, + mask, + scale); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_i64gather_epi32 (__m128i src, int const *base, __m128i index, + __m128i mask, const int scale) +{ + return (__m128i) __builtin_ia32_gatherdiv4si ((__v4si)src, + base, + (__v2di)index, + (__v4si)mask, + scale); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_i64gather_epi32 (int const *base, __m256i index, const int scale) +{ + __v4si src = __extension__ (__v4si){ 0, 0, 0, 0 }; + __v4si mask = __extension__ (__v4si){ ~0, ~0, ~0, ~0 }; + + return (__m128i) __builtin_ia32_gatherdiv4si256 (src, + base, + (__v4di)index, + mask, + scale); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_i64gather_epi32 (__m128i src, int const *base, + __m256i index, __m128i mask, const int scale) +{ + return (__m128i) __builtin_ia32_gatherdiv4si256 ((__v4si)src, + base, + (__v4di)index, + (__v4si)mask, + scale); +} +#else /* __OPTIMIZE__ */ +#define _mm_i32gather_pd(BASE, INDEX, SCALE) \ + (__m128d) __builtin_ia32_gathersiv2df ((__v2df) _mm_setzero_pd (), \ + (double const *)BASE, \ + (__v4si)(__m128i)INDEX, \ + (__v2df)_mm_set1_pd( \ + (double)(long long int) -1), \ + (int)SCALE) + +#define _mm_mask_i32gather_pd(SRC, BASE, INDEX, MASK, SCALE) \ + (__m128d) __builtin_ia32_gathersiv2df ((__v2df)(__m128d)SRC, \ + (double const *)BASE, \ + (__v4si)(__m128i)INDEX, \ + (__v2df)(__m128d)MASK, \ + (int)SCALE) + +#define _mm256_i32gather_pd(BASE, INDEX, SCALE) \ + (__m256d) __builtin_ia32_gathersiv4df ((__v4df) _mm256_setzero_pd (), \ + (double const *)BASE, \ + (__v4si)(__m128i)INDEX, \ + (__v4df)_mm256_set1_pd( \ + (double)(long long int) -1), \ + (int)SCALE) + +#define _mm256_mask_i32gather_pd(SRC, BASE, INDEX, MASK, SCALE) \ + (__m256d) __builtin_ia32_gathersiv4df ((__v4df)(__m256d)SRC, \ + (double const *)BASE, \ + (__v4si)(__m128i)INDEX, \ + (__v4df)(__m256d)MASK, \ + (int)SCALE) + +#define _mm_i64gather_pd(BASE, INDEX, SCALE) \ + (__m128d) __builtin_ia32_gatherdiv2df ((__v2df) _mm_setzero_pd (), \ + (double const *)BASE, \ + (__v2di)(__m128i)INDEX, \ + (__v2df)_mm_set1_pd( \ + (double)(long long int) -1), \ + (int)SCALE) + +#define _mm_mask_i64gather_pd(SRC, BASE, INDEX, MASK, SCALE) \ + (__m128d) __builtin_ia32_gatherdiv2df ((__v2df)(__m128d)SRC, \ + (double const *)BASE, \ + (__v2di)(__m128i)INDEX, \ + (__v2df)(__m128d)MASK, \ + (int)SCALE) + +#define _mm256_i64gather_pd(BASE, INDEX, SCALE) \ + (__m256d) __builtin_ia32_gatherdiv4df ((__v4df) _mm256_setzero_pd (), \ + (double const *)BASE, \ + (__v4di)(__m256i)INDEX, \ + (__v4df)_mm256_set1_pd( \ + (double)(long long int) -1), \ + (int)SCALE) + +#define _mm256_mask_i64gather_pd(SRC, BASE, INDEX, MASK, SCALE) \ + (__m256d) __builtin_ia32_gatherdiv4df ((__v4df)(__m256d)SRC, \ + (double const *)BASE, \ + (__v4di)(__m256i)INDEX, \ + (__v4df)(__m256d)MASK, \ + (int)SCALE) + +#define _mm_i32gather_ps(BASE, INDEX, SCALE) \ + (__m128) __builtin_ia32_gathersiv4sf ((__v4sf) _mm_setzero_ps (), \ + (float const *)BASE, \ + (__v4si)(__m128i)INDEX, \ + _mm_set1_ps ((float)(int) -1), \ + (int)SCALE) + +#define _mm_mask_i32gather_ps(SRC, BASE, INDEX, MASK, SCALE) \ + (__m128) __builtin_ia32_gathersiv4sf ((__v4sf)(__m128d)SRC, \ + (float const *)BASE, \ + (__v4si)(__m128i)INDEX, \ + (__v4sf)(__m128d)MASK, \ + (int)SCALE) + +#define _mm256_i32gather_ps(BASE, INDEX, SCALE) \ + (__m256) __builtin_ia32_gathersiv8sf ((__v8sf) _mm256_setzero_ps (), \ + (float const *)BASE, \ + (__v8si)(__m256i)INDEX, \ + (__v8sf)_mm256_set1_ps ( \ + (float)(int) -1), \ + (int)SCALE) + +#define _mm256_mask_i32gather_ps(SRC, BASE, INDEX, MASK, SCALE) \ + (__m256) __builtin_ia32_gathersiv8sf ((__v8sf)(__m256)SRC, \ + (float const *)BASE, \ + (__v8si)(__m256i)INDEX, \ + (__v8sf)(__m256d)MASK, \ + (int)SCALE) + +#define _mm_i64gather_ps(BASE, INDEX, SCALE) \ + (__m128) __builtin_ia32_gatherdiv4sf ((__v4sf) _mm_setzero_pd (), \ + (float const *)BASE, \ + (__v2di)(__m128i)INDEX, \ + (__v4sf)_mm_set1_ps ( \ + (float)(int) -1), \ + (int)SCALE) + +#define _mm_mask_i64gather_ps(SRC, BASE, INDEX, MASK, SCALE) \ + (__m128) __builtin_ia32_gatherdiv4sf ((__v4sf)(__m128)SRC, \ + (float const *)BASE, \ + (__v2di)(__m128i)INDEX, \ + (__v4sf)(__m128d)MASK, \ + (int)SCALE) + +#define _mm256_i64gather_ps(BASE, INDEX, SCALE) \ + (__m128) __builtin_ia32_gatherdiv4sf256 ((__v4sf) _mm_setzero_ps (), \ + (float const *)BASE, \ + (__v4di)(__m256i)INDEX, \ + (__v4sf)_mm_set1_ps( \ + (float)(int) -1), \ + (int)SCALE) + +#define _mm256_mask_i64gather_ps(SRC, BASE, INDEX, MASK, SCALE) \ + (__m128) __builtin_ia32_gatherdiv4sf256 ((__v4sf)(__m128)SRC, \ + (float const *)BASE, \ + (__v4di)(__m256i)INDEX, \ + (__v4sf)(__m128)MASK, \ + (int)SCALE) + +#define _mm_i32gather_epi64(BASE, INDEX, SCALE) \ + (__m128i) __builtin_ia32_gathersiv2di ((__v2di) _mm_setzero_si128 (), \ + (long long const *)BASE, \ + (__v4si)(__m128i)INDEX, \ + (__v2di)_mm_set1_epi64x (-1), \ + (int)SCALE) + +#define _mm_mask_i32gather_epi64(SRC, BASE, INDEX, MASK, SCALE) \ + (__m128i) __builtin_ia32_gathersiv2di ((__v2di)(__m128i)SRC, \ + (long long const *)BASE, \ + (__v4si)(__m128i)INDEX, \ + (__v2di)(__m128i)MASK, \ + (int)SCALE) + +#define _mm256_i32gather_epi64(BASE, INDEX, SCALE) \ + (__m256i) __builtin_ia32_gathersiv4di ((__v4di) _mm256_setzero_si256 (), \ + (long long const *)BASE, \ + (__v4si)(__m128i)INDEX, \ + (__v4di)_mm256_set1_epi64x (-1), \ + (int)SCALE) + +#define _mm256_mask_i32gather_epi64(SRC, BASE, INDEX, MASK, SCALE) \ + (__m256i) __builtin_ia32_gathersiv4di ((__v4di)(__m256i)SRC, \ + (long long const *)BASE, \ + (__v4si)(__m128i)INDEX, \ + (__v4di)(__m256i)MASK, \ + (int)SCALE) + +#define _mm_i64gather_epi64(BASE, INDEX, SCALE) \ + (__m128i) __builtin_ia32_gatherdiv2di ((__v2di) _mm_setzero_si128 (), \ + (long long const *)BASE, \ + (__v2di)(__m128i)INDEX, \ + (__v2di)_mm_set1_epi64x (-1), \ + (int)SCALE) + +#define _mm_mask_i64gather_epi64(SRC, BASE, INDEX, MASK, SCALE) \ + (__m128i) __builtin_ia32_gatherdiv2di ((__v2di)(__m128i)SRC, \ + (long long const *)BASE, \ + (__v2di)(__m128i)INDEX, \ + (__v2di)(__m128i)MASK, \ + (int)SCALE) + +#define _mm256_i64gather_epi64(BASE, INDEX, SCALE) \ + (__m256i) __builtin_ia32_gatherdiv4di ((__v4di) _mm256_setzero_si256 (), \ + (long long const *)BASE, \ + (__v4di)(__m256i)INDEX, \ + (__v4di)_mm256_set1_epi64x (-1), \ + (int)SCALE) + +#define _mm256_mask_i64gather_epi64(SRC, BASE, INDEX, MASK, SCALE) \ + (__m256i) __builtin_ia32_gatherdiv4di ((__v4di)(__m256i)SRC, \ + (long long const *)BASE, \ + (__v4di)(__m256i)INDEX, \ + (__v4di)(__m256i)MASK, \ + (int)SCALE) + +#define _mm_i32gather_epi32(BASE, INDEX, SCALE) \ + (__m128i) __builtin_ia32_gathersiv4si ((__v4si) _mm_setzero_si128 (), \ + (int const *)BASE, \ + (__v4si)(__m128i)INDEX, \ + (__v4si)_mm_set1_epi32 (-1), \ + (int)SCALE) + +#define _mm_mask_i32gather_epi32(SRC, BASE, INDEX, MASK, SCALE) \ + (__m128i) __builtin_ia32_gathersiv4si ((__v4si)(__m128i)SRC, \ + (int const *)BASE, \ + (__v4si)(__m128i)INDEX, \ + (__v4si)(__m128i)MASK, \ + (int)SCALE) + +#define _mm256_i32gather_epi32(BASE, INDEX, SCALE) \ + (__m256i) __builtin_ia32_gathersiv8si ((__v8si) _mm256_setzero_si256 (), \ + (int const *)BASE, \ + (__v8si)(__m256i)INDEX, \ + (__v8si)_mm256_set1_epi32 (-1), \ + (int)SCALE) + +#define _mm256_mask_i32gather_epi32(SRC, BASE, INDEX, MASK, SCALE) \ + (__m256i) __builtin_ia32_gathersiv8si ((__v8si)(__m256i)SRC, \ + (int const *)BASE, \ + (__v8si)(__m256i)INDEX, \ + (__v8si)(__m256i)MASK, \ + (int)SCALE) + +#define _mm_i64gather_epi32(BASE, INDEX, SCALE) \ + (__m128i) __builtin_ia32_gatherdiv4si ((__v4si) _mm_setzero_si128 (), \ + (int const *)BASE, \ + (__v2di)(__m128i)INDEX, \ + (__v4si)_mm_set1_epi32 (-1), \ + (int)SCALE) + +#define _mm_mask_i64gather_epi32(SRC, BASE, INDEX, MASK, SCALE) \ + (__m128i) __builtin_ia32_gatherdiv4si ((__v4si)(__m128i)SRC, \ + (int const *)BASE, \ + (__v2di)(__m128i)INDEX, \ + (__v4si)(__m128i)MASK, \ + (int)SCALE) + +#define _mm256_i64gather_epi32(BASE, INDEX, SCALE) \ + (__m128i) __builtin_ia32_gatherdiv4si256 ((__v4si) _mm_setzero_si128 (), \ + (int const *)BASE, \ + (__v4di)(__m256i)INDEX, \ + (__v4si)_mm_set1_epi32(-1), \ + (int)SCALE) + +#define _mm256_mask_i64gather_epi32(SRC, BASE, INDEX, MASK, SCALE) \ + (__m128i) __builtin_ia32_gatherdiv4si256 ((__v4si)(__m128i)SRC, \ + (int const *)BASE, \ + (__v4di)(__m256i)INDEX, \ + (__v4si)(__m128i)MASK, \ + (int)SCALE) +#endif /* __OPTIMIZE__ */ + +#ifdef __DISABLE_AVX2__ +#undef __DISABLE_AVX2__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX2__ */ + +#endif /* _AVX2INTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/avx512cdintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/avx512cdintrin.h new file mode 100644 index 0000000..a4939f7 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/avx512cdintrin.h @@ -0,0 +1,184 @@ +/* Copyright (C) 2013-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVX512CDINTRIN_H_INCLUDED +#define _AVX512CDINTRIN_H_INCLUDED + +#ifndef __AVX512CD__ +#pragma GCC push_options +#pragma GCC target("avx512cd") +#define __DISABLE_AVX512CD__ +#endif /* __AVX512CD__ */ + +/* Internal data types for implementing the intrinsics. */ +typedef long long __v8di __attribute__ ((__vector_size__ (64))); +typedef int __v16si __attribute__ ((__vector_size__ (64))); + +/* The Intel API is flexible enough that we must allow aliasing with other + vector types, and their scalar components. */ +typedef long long __m512i __attribute__ ((__vector_size__ (64), __may_alias__)); +typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__)); + +typedef unsigned char __mmask8; +typedef unsigned short __mmask16; + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_conflict_epi32 (__m512i __A) +{ + return (__m512i) + __builtin_ia32_vpconflictsi_512_mask ((__v16si) __A, + (__v16si) _mm512_setzero_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_conflict_epi32 (__m512i __W, __mmask16 __U, __m512i __A) +{ + return (__m512i) __builtin_ia32_vpconflictsi_512_mask ((__v16si) __A, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_conflict_epi32 (__mmask16 __U, __m512i __A) +{ + return (__m512i) + __builtin_ia32_vpconflictsi_512_mask ((__v16si) __A, + (__v16si) _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_conflict_epi64 (__m512i __A) +{ + return (__m512i) + __builtin_ia32_vpconflictdi_512_mask ((__v8di) __A, + (__v8di) _mm512_setzero_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_conflict_epi64 (__m512i __W, __mmask8 __U, __m512i __A) +{ + return (__m512i) __builtin_ia32_vpconflictdi_512_mask ((__v8di) __A, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_conflict_epi64 (__mmask8 __U, __m512i __A) +{ + return (__m512i) + __builtin_ia32_vpconflictdi_512_mask ((__v8di) __A, + (__v8di) _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_lzcnt_epi64 (__m512i __A) +{ + return (__m512i) + __builtin_ia32_vplzcntq_512_mask ((__v8di) __A, + (__v8di) _mm512_setzero_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_lzcnt_epi64 (__m512i __W, __mmask8 __U, __m512i __A) +{ + return (__m512i) __builtin_ia32_vplzcntq_512_mask ((__v8di) __A, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_lzcnt_epi64 (__mmask8 __U, __m512i __A) +{ + return (__m512i) + __builtin_ia32_vplzcntq_512_mask ((__v8di) __A, + (__v8di) _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_lzcnt_epi32 (__m512i __A) +{ + return (__m512i) + __builtin_ia32_vplzcntd_512_mask ((__v16si) __A, + (__v16si) _mm512_setzero_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_lzcnt_epi32 (__m512i __W, __mmask16 __U, __m512i __A) +{ + return (__m512i) __builtin_ia32_vplzcntd_512_mask ((__v16si) __A, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_lzcnt_epi32 (__mmask16 __U, __m512i __A) +{ + return (__m512i) + __builtin_ia32_vplzcntd_512_mask ((__v16si) __A, + (__v16si) _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_broadcastmb_epi64 (__mmask8 __A) +{ + return (__m512i) __builtin_ia32_broadcastmb512 (__A); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_broadcastmw_epi32 (__mmask16 __A) +{ + return (__m512i) __builtin_ia32_broadcastmw512 (__A); +} + +#ifdef __DISABLE_AVX512CD__ +#undef __DISABLE_AVX512CD__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX512CD__ */ + +#endif /* _AVX512CDINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/avx512erintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/avx512erintrin.h new file mode 100644 index 0000000..f6870a5 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/avx512erintrin.h @@ -0,0 +1,394 @@ +/* Copyright (C) 2013-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVX512ERINTRIN_H_INCLUDED +#define _AVX512ERINTRIN_H_INCLUDED + +#ifndef __AVX512ER__ +#pragma GCC push_options +#pragma GCC target("avx512er") +#define __DISABLE_AVX512ER__ +#endif /* __AVX512ER__ */ + +/* Internal data types for implementing the intrinsics. */ +typedef double __v8df __attribute__ ((__vector_size__ (64))); +typedef float __v16sf __attribute__ ((__vector_size__ (64))); + +/* The Intel API is flexible enough that we must allow aliasing with other + vector types, and their scalar components. */ +typedef float __m512 __attribute__ ((__vector_size__ (64), __may_alias__)); +typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__)); + +typedef unsigned char __mmask8; +typedef unsigned short __mmask16; + +#ifdef __OPTIMIZE__ +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_exp2a23_round_pd (__m512d __A, int __R) +{ + __m512d __W; + return (__m512d) __builtin_ia32_exp2pd_mask ((__v8df) __A, + (__v8df) __W, + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_exp2a23_round_pd (__m512d __W, __mmask8 __U, __m512d __A, int __R) +{ + return (__m512d) __builtin_ia32_exp2pd_mask ((__v8df) __A, + (__v8df) __W, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_exp2a23_round_pd (__mmask8 __U, __m512d __A, int __R) +{ + return (__m512d) __builtin_ia32_exp2pd_mask ((__v8df) __A, + (__v8df) _mm512_setzero_pd (), + (__mmask8) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_exp2a23_round_ps (__m512 __A, int __R) +{ + __m512 __W; + return (__m512) __builtin_ia32_exp2ps_mask ((__v16sf) __A, + (__v16sf) __W, + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_exp2a23_round_ps (__m512 __W, __mmask16 __U, __m512 __A, int __R) +{ + return (__m512) __builtin_ia32_exp2ps_mask ((__v16sf) __A, + (__v16sf) __W, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_exp2a23_round_ps (__mmask16 __U, __m512 __A, int __R) +{ + return (__m512) __builtin_ia32_exp2ps_mask ((__v16sf) __A, + (__v16sf) _mm512_setzero_ps (), + (__mmask16) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_rcp28_round_pd (__m512d __A, int __R) +{ + __m512d __W; + return (__m512d) __builtin_ia32_rcp28pd_mask ((__v8df) __A, + (__v8df) __W, + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_rcp28_round_pd (__m512d __W, __mmask8 __U, __m512d __A, int __R) +{ + return (__m512d) __builtin_ia32_rcp28pd_mask ((__v8df) __A, + (__v8df) __W, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_rcp28_round_pd (__mmask8 __U, __m512d __A, int __R) +{ + return (__m512d) __builtin_ia32_rcp28pd_mask ((__v8df) __A, + (__v8df) _mm512_setzero_pd (), + (__mmask8) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_rcp28_round_ps (__m512 __A, int __R) +{ + __m512 __W; + return (__m512) __builtin_ia32_rcp28ps_mask ((__v16sf) __A, + (__v16sf) __W, + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_rcp28_round_ps (__m512 __W, __mmask16 __U, __m512 __A, int __R) +{ + return (__m512) __builtin_ia32_rcp28ps_mask ((__v16sf) __A, + (__v16sf) __W, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_rcp28_round_ps (__mmask16 __U, __m512 __A, int __R) +{ + return (__m512) __builtin_ia32_rcp28ps_mask ((__v16sf) __A, + (__v16sf) _mm512_setzero_ps (), + (__mmask16) __U, __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rcp28_round_sd (__m128d __A, __m128d __B, int __R) +{ + return (__m128d) __builtin_ia32_rcp28sd_round ((__v2df) __B, + (__v2df) __A, + __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rcp28_round_ss (__m128 __A, __m128 __B, int __R) +{ + return (__m128) __builtin_ia32_rcp28ss_round ((__v4sf) __B, + (__v4sf) __A, + __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_rsqrt28_round_pd (__m512d __A, int __R) +{ + __m512d __W; + return (__m512d) __builtin_ia32_rsqrt28pd_mask ((__v8df) __A, + (__v8df) __W, + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_rsqrt28_round_pd (__m512d __W, __mmask8 __U, __m512d __A, int __R) +{ + return (__m512d) __builtin_ia32_rsqrt28pd_mask ((__v8df) __A, + (__v8df) __W, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_rsqrt28_round_pd (__mmask8 __U, __m512d __A, int __R) +{ + return (__m512d) __builtin_ia32_rsqrt28pd_mask ((__v8df) __A, + (__v8df) _mm512_setzero_pd (), + (__mmask8) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_rsqrt28_round_ps (__m512 __A, int __R) +{ + __m512 __W; + return (__m512) __builtin_ia32_rsqrt28ps_mask ((__v16sf) __A, + (__v16sf) __W, + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_rsqrt28_round_ps (__m512 __W, __mmask16 __U, __m512 __A, int __R) +{ + return (__m512) __builtin_ia32_rsqrt28ps_mask ((__v16sf) __A, + (__v16sf) __W, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_rsqrt28_round_ps (__mmask16 __U, __m512 __A, int __R) +{ + return (__m512) __builtin_ia32_rsqrt28ps_mask ((__v16sf) __A, + (__v16sf) _mm512_setzero_ps (), + (__mmask16) __U, __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rsqrt28_round_sd (__m128d __A, __m128d __B, int __R) +{ + return (__m128d) __builtin_ia32_rsqrt28sd_round ((__v2df) __B, + (__v2df) __A, + __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rsqrt28_round_ss (__m128 __A, __m128 __B, int __R) +{ + return (__m128) __builtin_ia32_rsqrt28ss_round ((__v4sf) __B, + (__v4sf) __A, + __R); +} + +#else +#define _mm512_exp2a23_round_pd(A, C) \ + __builtin_ia32_exp2pd_mask(A, (__v8df)_mm512_setzero_pd(), -1, C) + +#define _mm512_mask_exp2a23_round_pd(W, U, A, C) \ + __builtin_ia32_exp2pd_mask(A, W, U, C) + +#define _mm512_maskz_exp2a23_round_pd(U, A, C) \ + __builtin_ia32_exp2pd_mask(A, (__v8df)_mm512_setzero_pd(), U, C) + +#define _mm512_exp2a23_round_ps(A, C) \ + __builtin_ia32_exp2ps_mask(A, (__v16sf)_mm512_setzero_ps(), -1, C) + +#define _mm512_mask_exp2a23_round_ps(W, U, A, C) \ + __builtin_ia32_exp2ps_mask(A, W, U, C) + +#define _mm512_maskz_exp2a23_round_ps(U, A, C) \ + __builtin_ia32_exp2ps_mask(A, (__v16sf)_mm512_setzero_ps(), U, C) + +#define _mm512_rcp28_round_pd(A, C) \ + __builtin_ia32_rcp28pd_mask(A, (__v8df)_mm512_setzero_pd(), -1, C) + +#define _mm512_mask_rcp28_round_pd(W, U, A, C) \ + __builtin_ia32_rcp28pd_mask(A, W, U, C) + +#define _mm512_maskz_rcp28_round_pd(U, A, C) \ + __builtin_ia32_rcp28pd_mask(A, (__v8df)_mm512_setzero_pd(), U, C) + +#define _mm512_rcp28_round_ps(A, C) \ + __builtin_ia32_rcp28ps_mask(A, (__v16sf)_mm512_setzero_ps(), -1, C) + +#define _mm512_mask_rcp28_round_ps(W, U, A, C) \ + __builtin_ia32_rcp28ps_mask(A, W, U, C) + +#define _mm512_maskz_rcp28_round_ps(U, A, C) \ + __builtin_ia32_rcp28ps_mask(A, (__v16sf)_mm512_setzero_ps(), U, C) + +#define _mm512_rsqrt28_round_pd(A, C) \ + __builtin_ia32_rsqrt28pd_mask(A, (__v8df)_mm512_setzero_pd(), -1, C) + +#define _mm512_mask_rsqrt28_round_pd(W, U, A, C) \ + __builtin_ia32_rsqrt28pd_mask(A, W, U, C) + +#define _mm512_maskz_rsqrt28_round_pd(U, A, C) \ + __builtin_ia32_rsqrt28pd_mask(A, (__v8df)_mm512_setzero_pd(), U, C) + +#define _mm512_rsqrt28_round_ps(A, C) \ + __builtin_ia32_rsqrt28ps_mask(A, (__v16sf)_mm512_setzero_ps(), -1, C) + +#define _mm512_mask_rsqrt28_round_ps(W, U, A, C) \ + __builtin_ia32_rsqrt28ps_mask(A, W, U, C) + +#define _mm512_maskz_rsqrt28_round_ps(U, A, C) \ + __builtin_ia32_rsqrt28ps_mask(A, (__v16sf)_mm512_setzero_ps(), U, C) + +#define _mm_rcp28_round_sd(A, B, R) \ + __builtin_ia32_rcp28sd_round(A, B, R) + +#define _mm_rcp28_round_ss(A, B, R) \ + __builtin_ia32_rcp28ss_round(A, B, R) + +#define _mm_rsqrt28_round_sd(A, B, R) \ + __builtin_ia32_rsqrt28sd_round(A, B, R) + +#define _mm_rsqrt28_round_ss(A, B, R) \ + __builtin_ia32_rsqrt28ss_round(A, B, R) + +#endif + +#define _mm512_exp2a23_pd(A) \ + _mm512_exp2a23_round_pd(A, _MM_FROUND_CUR_DIRECTION) + +#define _mm512_mask_exp2a23_pd(W, U, A) \ + _mm512_mask_exp2a23_round_pd(W, U, A, _MM_FROUND_CUR_DIRECTION) + +#define _mm512_maskz_exp2a23_pd(U, A) \ + _mm512_maskz_exp2a23_round_pd(U, A, _MM_FROUND_CUR_DIRECTION) + +#define _mm512_exp2a23_ps(A) \ + _mm512_exp2a23_round_ps(A, _MM_FROUND_CUR_DIRECTION) + +#define _mm512_mask_exp2a23_ps(W, U, A) \ + _mm512_mask_exp2a23_round_ps(W, U, A, _MM_FROUND_CUR_DIRECTION) + +#define _mm512_maskz_exp2a23_ps(U, A) \ + _mm512_maskz_exp2a23_round_ps(U, A, _MM_FROUND_CUR_DIRECTION) + +#define _mm512_rcp28_pd(A) \ + _mm512_rcp28_round_pd(A, _MM_FROUND_CUR_DIRECTION) + +#define _mm512_mask_rcp28_pd(W, U, A) \ + _mm512_mask_rcp28_round_pd(W, U, A, _MM_FROUND_CUR_DIRECTION) + +#define _mm512_maskz_rcp28_pd(U, A) \ + _mm512_maskz_rcp28_round_pd(U, A, _MM_FROUND_CUR_DIRECTION) + +#define _mm512_rcp28_ps(A) \ + _mm512_rcp28_round_ps(A, _MM_FROUND_CUR_DIRECTION) + +#define _mm512_mask_rcp28_ps(W, U, A) \ + _mm512_mask_rcp28_round_ps(W, U, A, _MM_FROUND_CUR_DIRECTION) + +#define _mm512_maskz_rcp28_ps(U, A) \ + _mm512_maskz_rcp28_round_ps(U, A, _MM_FROUND_CUR_DIRECTION) + +#define _mm512_rsqrt28_pd(A) \ + _mm512_rsqrt28_round_pd(A, _MM_FROUND_CUR_DIRECTION) + +#define _mm512_mask_rsqrt28_pd(W, U, A) \ + _mm512_mask_rsqrt28_round_pd(W, U, A, _MM_FROUND_CUR_DIRECTION) + +#define _mm512_maskz_rsqrt28_pd(U, A) \ + _mm512_maskz_rsqrt28_round_pd(U, A, _MM_FROUND_CUR_DIRECTION) + +#define _mm512_rsqrt28_ps(A) \ + _mm512_rsqrt28_round_ps(A, _MM_FROUND_CUR_DIRECTION) + +#define _mm512_mask_rsqrt28_ps(W, U, A) \ + _mm512_mask_rsqrt28_round_ps(W, U, A, _MM_FROUND_CUR_DIRECTION) + +#define _mm512_maskz_rsqrt28_ps(U, A) \ + _mm512_maskz_rsqrt28_round_ps(U, A, _MM_FROUND_CUR_DIRECTION) + +#define _mm_rcp28_sd(A, B) \ + __builtin_ia32_rcp28sd_round(B, A, _MM_FROUND_CUR_DIRECTION) + +#define _mm_rcp28_ss(A, B) \ + __builtin_ia32_rcp28ss_round(B, A, _MM_FROUND_CUR_DIRECTION) + +#define _mm_rsqrt28_sd(A, B) \ + __builtin_ia32_rsqrt28sd_round(B, A, _MM_FROUND_CUR_DIRECTION) + +#define _mm_rsqrt28_ss(A, B) \ + __builtin_ia32_rsqrt28ss_round(B, A, _MM_FROUND_CUR_DIRECTION) + +#ifdef __DISABLE_AVX512ER__ +#undef __DISABLE_AVX512ER__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX512ER__ */ + +#endif /* _AVX512ERINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/avx512fintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/avx512fintrin.h new file mode 100644 index 0000000..c4caa5a --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/avx512fintrin.h @@ -0,0 +1,12915 @@ +/* Copyright (C) 2013-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVX512FINTRIN_H_INCLUDED +#define _AVX512FINTRIN_H_INCLUDED + +#ifndef __AVX512F__ +#pragma GCC push_options +#pragma GCC target("avx512f") +#define __DISABLE_AVX512F__ +#endif /* __AVX512F__ */ + +/* Internal data types for implementing the intrinsics. */ +typedef double __v8df __attribute__ ((__vector_size__ (64))); +typedef float __v16sf __attribute__ ((__vector_size__ (64))); +typedef long long __v8di __attribute__ ((__vector_size__ (64))); +typedef int __v16si __attribute__ ((__vector_size__ (64))); +typedef short __v32hi __attribute__ ((__vector_size__ (64))); +typedef char __v64qi __attribute__ ((__vector_size__ (64))); + +/* The Intel API is flexible enough that we must allow aliasing with other + vector types, and their scalar components. */ +typedef float __m512 __attribute__ ((__vector_size__ (64), __may_alias__)); +typedef long long __m512i __attribute__ ((__vector_size__ (64), __may_alias__)); +typedef double __m512d __attribute__ ((__vector_size__ (64), __may_alias__)); + +typedef unsigned char __mmask8; +typedef unsigned short __mmask16; + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set_epi64 (long long __A, long long __B, long long __C, + long long __D, long long __E, long long __F, + long long __G, long long __H) +{ + return __extension__ (__m512i) (__v8di) + { __H, __G, __F, __E, __D, __C, __B, __A }; +} + +/* Create the vector [A B C D E F G H I J K L M N O P]. */ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set_epi32 (int __A, int __B, int __C, int __D, + int __E, int __F, int __G, int __H, + int __I, int __J, int __K, int __L, + int __M, int __N, int __O, int __P) +{ + return __extension__ (__m512i)(__v16si) + { __P, __O, __N, __M, __L, __K, __J, __I, + __H, __G, __F, __E, __D, __C, __B, __A }; +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set_pd (double __A, double __B, double __C, double __D, + double __E, double __F, double __G, double __H) +{ + return __extension__ (__m512d) + { __H, __G, __F, __E, __D, __C, __B, __A }; +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set_ps (float __A, float __B, float __C, float __D, + float __E, float __F, float __G, float __H, + float __I, float __J, float __K, float __L, + float __M, float __N, float __O, float __P) +{ + return __extension__ (__m512) + { __P, __O, __N, __M, __L, __K, __J, __I, + __H, __G, __F, __E, __D, __C, __B, __A }; +} + +#define _mm512_setr_epi64(e0,e1,e2,e3,e4,e5,e6,e7) \ + _mm512_set_epi64(e7,e6,e5,e4,e3,e2,e1,e0) + +#define _mm512_setr_epi32(e0,e1,e2,e3,e4,e5,e6,e7, \ + e8,e9,e10,e11,e12,e13,e14,e15) \ + _mm512_set_epi32(e15,e14,e13,e12,e11,e10,e9,e8,e7,e6,e5,e4,e3,e2,e1,e0) + +#define _mm512_setr_pd(e0,e1,e2,e3,e4,e5,e6,e7) \ + _mm512_set_pd(e7,e6,e5,e4,e3,e2,e1,e0) + +#define _mm512_setr_ps(e0,e1,e2,e3,e4,e5,e6,e7,e8,e9,e10,e11,e12,e13,e14,e15) \ + _mm512_set_ps(e15,e14,e13,e12,e11,e10,e9,e8,e7,e6,e5,e4,e3,e2,e1,e0) + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_undefined_ps (void) +{ + __m512 __Y = __Y; + return __Y; +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_undefined_pd (void) +{ + __m512d __Y = __Y; + return __Y; +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_undefined_si512 (void) +{ + __m512i __Y = __Y; + return __Y; +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set1_epi8 (char __A) +{ + return __extension__ (__m512i)(__v64qi) + { __A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A }; +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set1_epi16 (short __A) +{ + return __extension__ (__m512i)(__v32hi) + { __A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A }; +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set1_pd (double __A) +{ + return (__m512d) __builtin_ia32_broadcastsd512 (__extension__ + (__v2df) { __A, }, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set1_ps (float __A) +{ + return (__m512) __builtin_ia32_broadcastss512 (__extension__ + (__v4sf) { __A, }, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1); +} + +/* Create the vector [A B C D A B C D A B C D A B C D]. */ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set4_epi32 (int __A, int __B, int __C, int __D) +{ + return __extension__ (__m512i)(__v16si) + { __D, __C, __B, __A, __D, __C, __B, __A, + __D, __C, __B, __A, __D, __C, __B, __A }; +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set4_epi64 (long long __A, long long __B, long long __C, + long long __D) +{ + return __extension__ (__m512i) (__v8di) + { __D, __C, __B, __A, __D, __C, __B, __A }; +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set4_pd (double __A, double __B, double __C, double __D) +{ + return __extension__ (__m512d) + { __D, __C, __B, __A, __D, __C, __B, __A }; +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set4_ps (float __A, float __B, float __C, float __D) +{ + return __extension__ (__m512) + { __D, __C, __B, __A, __D, __C, __B, __A, + __D, __C, __B, __A, __D, __C, __B, __A }; +} + +#define _mm512_setr4_epi64(e0,e1,e2,e3) \ + _mm512_set4_epi64(e3,e2,e1,e0) + +#define _mm512_setr4_epi32(e0,e1,e2,e3) \ + _mm512_set4_epi32(e3,e2,e1,e0) + +#define _mm512_setr4_pd(e0,e1,e2,e3) \ + _mm512_set4_pd(e3,e2,e1,e0) + +#define _mm512_setr4_ps(e0,e1,e2,e3) \ + _mm512_set4_ps(e3,e2,e1,e0) + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_setzero_ps (void) +{ + return __extension__ (__m512){ 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, + 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 }; +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_setzero_pd (void) +{ + return __extension__ (__m512d) { 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 }; +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_setzero_epi32 (void) +{ + return __extension__ (__m512i)(__v8di){ 0, 0, 0, 0, 0, 0, 0, 0 }; +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_setzero_si512 (void) +{ + return __extension__ (__m512i)(__v8di){ 0, 0, 0, 0, 0, 0, 0, 0 }; +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_mov_pd (__m512d __W, __mmask8 __U, __m512d __A) +{ + return (__m512d) __builtin_ia32_movapd512_mask ((__v8df) __A, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_mov_pd (__mmask8 __U, __m512d __A) +{ + return (__m512d) __builtin_ia32_movapd512_mask ((__v8df) __A, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_mov_ps (__m512 __W, __mmask16 __U, __m512 __A) +{ + return (__m512) __builtin_ia32_movaps512_mask ((__v16sf) __A, + (__v16sf) __W, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_mov_ps (__mmask16 __U, __m512 __A) +{ + return (__m512) __builtin_ia32_movaps512_mask ((__v16sf) __A, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_load_pd (void const *__P) +{ + return *(__m512d *) __P; +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_load_pd (__m512d __W, __mmask8 __U, void const *__P) +{ + return (__m512d) __builtin_ia32_loadapd512_mask ((const __v8df *) __P, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_load_pd (__mmask8 __U, void const *__P) +{ + return (__m512d) __builtin_ia32_loadapd512_mask ((const __v8df *) __P, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_store_pd (void *__P, __m512d __A) +{ + *(__m512d *) __P = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_store_pd (void *__P, __mmask8 __U, __m512d __A) +{ + __builtin_ia32_storeapd512_mask ((__v8df *) __P, (__v8df) __A, + (__mmask8) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_load_ps (void const *__P) +{ + return *(__m512 *) __P; +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_load_ps (__m512 __W, __mmask16 __U, void const *__P) +{ + return (__m512) __builtin_ia32_loadaps512_mask ((const __v16sf *) __P, + (__v16sf) __W, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_load_ps (__mmask16 __U, void const *__P) +{ + return (__m512) __builtin_ia32_loadaps512_mask ((const __v16sf *) __P, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_store_ps (void *__P, __m512 __A) +{ + *(__m512 *) __P = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_store_ps (void *__P, __mmask16 __U, __m512 __A) +{ + __builtin_ia32_storeaps512_mask ((__v16sf *) __P, (__v16sf) __A, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_mov_epi64 (__m512i __W, __mmask8 __U, __m512i __A) +{ + return (__m512i) __builtin_ia32_movdqa64_512_mask ((__v8di) __A, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_mov_epi64 (__mmask8 __U, __m512i __A) +{ + return (__m512i) __builtin_ia32_movdqa64_512_mask ((__v8di) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_load_epi64 (void const *__P) +{ + return *(__m512i *) __P; +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_load_epi64 (__m512i __W, __mmask8 __U, void const *__P) +{ + return (__m512i) __builtin_ia32_movdqa64load512_mask ((const __v8di *) __P, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_load_epi64 (__mmask8 __U, void const *__P) +{ + return (__m512i) __builtin_ia32_movdqa64load512_mask ((const __v8di *) __P, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_store_epi64 (void *__P, __m512i __A) +{ + *(__m512i *) __P = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_store_epi64 (void *__P, __mmask8 __U, __m512i __A) +{ + __builtin_ia32_movdqa64store512_mask ((__v8di *) __P, (__v8di) __A, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_mov_epi32 (__m512i __W, __mmask16 __U, __m512i __A) +{ + return (__m512i) __builtin_ia32_movdqa32_512_mask ((__v16si) __A, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_mov_epi32 (__mmask16 __U, __m512i __A) +{ + return (__m512i) __builtin_ia32_movdqa32_512_mask ((__v16si) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_load_si512 (void const *__P) +{ + return *(__m512i *) __P; +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_load_epi32 (void const *__P) +{ + return *(__m512i *) __P; +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_load_epi32 (__m512i __W, __mmask16 __U, void const *__P) +{ + return (__m512i) __builtin_ia32_movdqa32load512_mask ((const __v16si *) __P, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_load_epi32 (__mmask16 __U, void const *__P) +{ + return (__m512i) __builtin_ia32_movdqa32load512_mask ((const __v16si *) __P, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_store_si512 (void *__P, __m512i __A) +{ + *(__m512i *) __P = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_store_epi32 (void *__P, __m512i __A) +{ + *(__m512i *) __P = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_store_epi32 (void *__P, __mmask16 __U, __m512i __A) +{ + __builtin_ia32_movdqa32store512_mask ((__v16si *) __P, (__v16si) __A, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mullo_epi32 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pmulld512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_mullo_epi32 (__mmask16 __M, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pmulld512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_setzero_si512 (), + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_mullo_epi32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pmulld512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) __W, __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sllv_epi32 (__m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_psllv16si_mask ((__v16si) __X, + (__v16si) __Y, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sllv_epi32 (__m512i __W, __mmask16 __U, __m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_psllv16si_mask ((__v16si) __X, + (__v16si) __Y, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sllv_epi32 (__mmask16 __U, __m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_psllv16si_mask ((__v16si) __X, + (__v16si) __Y, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_srav_epi32 (__m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_psrav16si_mask ((__v16si) __X, + (__v16si) __Y, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_srav_epi32 (__m512i __W, __mmask16 __U, __m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_psrav16si_mask ((__v16si) __X, + (__v16si) __Y, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_srav_epi32 (__mmask16 __U, __m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_psrav16si_mask ((__v16si) __X, + (__v16si) __Y, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_srlv_epi32 (__m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_psrlv16si_mask ((__v16si) __X, + (__v16si) __Y, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_srlv_epi32 (__m512i __W, __mmask16 __U, __m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_psrlv16si_mask ((__v16si) __X, + (__v16si) __Y, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_srlv_epi32 (__mmask16 __U, __m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_psrlv16si_mask ((__v16si) __X, + (__v16si) __Y, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_add_epi64 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_paddq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_add_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_paddq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_add_epi64 (__mmask8 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_paddq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sub_epi64 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_psubq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sub_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_psubq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sub_epi64 (__mmask8 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_psubq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sllv_epi64 (__m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_psllv8di_mask ((__v8di) __X, + (__v8di) __Y, + (__v8di) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sllv_epi64 (__m512i __W, __mmask8 __U, __m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_psllv8di_mask ((__v8di) __X, + (__v8di) __Y, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sllv_epi64 (__mmask8 __U, __m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_psllv8di_mask ((__v8di) __X, + (__v8di) __Y, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_srav_epi64 (__m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X, + (__v8di) __Y, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_srav_epi64 (__m512i __W, __mmask8 __U, __m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X, + (__v8di) __Y, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_srav_epi64 (__mmask8 __U, __m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X, + (__v8di) __Y, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_srlv_epi64 (__m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_psrlv8di_mask ((__v8di) __X, + (__v8di) __Y, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_srlv_epi64 (__m512i __W, __mmask8 __U, __m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_psrlv8di_mask ((__v8di) __X, + (__v8di) __Y, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_srlv_epi64 (__mmask8 __U, __m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_psrlv8di_mask ((__v8di) __X, + (__v8di) __Y, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_add_epi32 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_paddd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_add_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_paddd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_add_epi32 (__mmask16 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_paddd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mul_epi32 (__m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_pmuldq512_mask ((__v16si) __X, + (__v16si) __Y, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_mul_epi32 (__m512i __W, __mmask8 __M, __m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_pmuldq512_mask ((__v16si) __X, + (__v16si) __Y, + (__v8di) __W, __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_mul_epi32 (__mmask8 __M, __m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_pmuldq512_mask ((__v16si) __X, + (__v16si) __Y, + (__v8di) + _mm512_setzero_si512 (), + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sub_epi32 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_psubd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sub_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_psubd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sub_epi32 (__mmask16 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_psubd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mul_epu32 (__m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_pmuludq512_mask ((__v16si) __X, + (__v16si) __Y, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_mul_epu32 (__m512i __W, __mmask8 __M, __m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_pmuludq512_mask ((__v16si) __X, + (__v16si) __Y, + (__v8di) __W, __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_mul_epu32 (__mmask8 __M, __m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_pmuludq512_mask ((__v16si) __X, + (__v16si) __Y, + (__v8di) + _mm512_setzero_si512 (), + __M); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_slli_epi64 (__m512i __A, unsigned int __B) +{ + return (__m512i) __builtin_ia32_psllqi512_mask ((__v8di) __A, __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_slli_epi64 (__m512i __W, __mmask8 __U, __m512i __A, + unsigned int __B) +{ + return (__m512i) __builtin_ia32_psllqi512_mask ((__v8di) __A, __B, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_slli_epi64 (__mmask8 __U, __m512i __A, unsigned int __B) +{ + return (__m512i) __builtin_ia32_psllqi512_mask ((__v8di) __A, __B, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} +#else +#define _mm512_slli_epi64(X, C) \ + ((__m512i) __builtin_ia32_psllqi512_mask ((__v8di)(__m512i)(X), (int)(C),\ + (__v8di)(__m512i)_mm512_undefined_si512 (),\ + (__mmask8)-1)) + +#define _mm512_mask_slli_epi64(W, U, X, C) \ + ((__m512i) __builtin_ia32_psllqi512_mask ((__v8di)(__m512i)(X), (int)(C),\ + (__v8di)(__m512i)(W),\ + (__mmask8)(U))) + +#define _mm512_maskz_slli_epi64(U, X, C) \ + ((__m512i) __builtin_ia32_psllqi512_mask ((__v8di)(__m512i)(X), (int)(C),\ + (__v8di)(__m512i)_mm512_setzero_si512 (),\ + (__mmask8)(U))) +#endif + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sll_epi64 (__m512i __A, __m128i __B) +{ + return (__m512i) __builtin_ia32_psllq512_mask ((__v8di) __A, + (__v2di) __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sll_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m128i __B) +{ + return (__m512i) __builtin_ia32_psllq512_mask ((__v8di) __A, + (__v2di) __B, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sll_epi64 (__mmask8 __U, __m512i __A, __m128i __B) +{ + return (__m512i) __builtin_ia32_psllq512_mask ((__v8di) __A, + (__v2di) __B, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_srli_epi64 (__m512i __A, unsigned int __B) +{ + return (__m512i) __builtin_ia32_psrlqi512_mask ((__v8di) __A, __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_srli_epi64 (__m512i __W, __mmask8 __U, + __m512i __A, unsigned int __B) +{ + return (__m512i) __builtin_ia32_psrlqi512_mask ((__v8di) __A, __B, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_srli_epi64 (__mmask8 __U, __m512i __A, unsigned int __B) +{ + return (__m512i) __builtin_ia32_psrlqi512_mask ((__v8di) __A, __B, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} +#else +#define _mm512_srli_epi64(X, C) \ + ((__m512i) __builtin_ia32_psrlqi512_mask ((__v8di)(__m512i)(X), (int)(C),\ + (__v8di)(__m512i)_mm512_undefined_si512 (),\ + (__mmask8)-1)) + +#define _mm512_mask_srli_epi64(W, U, X, C) \ + ((__m512i) __builtin_ia32_psrlqi512_mask ((__v8di)(__m512i)(X), (int)(C),\ + (__v8di)(__m512i)(W),\ + (__mmask8)(U))) + +#define _mm512_maskz_srli_epi64(U, X, C) \ + ((__m512i) __builtin_ia32_psrlqi512_mask ((__v8di)(__m512i)(X), (int)(C),\ + (__v8di)(__m512i)_mm512_setzero_si512 (),\ + (__mmask8)(U))) +#endif + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_srl_epi64 (__m512i __A, __m128i __B) +{ + return (__m512i) __builtin_ia32_psrlq512_mask ((__v8di) __A, + (__v2di) __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_srl_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m128i __B) +{ + return (__m512i) __builtin_ia32_psrlq512_mask ((__v8di) __A, + (__v2di) __B, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_srl_epi64 (__mmask8 __U, __m512i __A, __m128i __B) +{ + return (__m512i) __builtin_ia32_psrlq512_mask ((__v8di) __A, + (__v2di) __B, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_srai_epi64 (__m512i __A, unsigned int __B) +{ + return (__m512i) __builtin_ia32_psraqi512_mask ((__v8di) __A, __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_srai_epi64 (__m512i __W, __mmask8 __U, __m512i __A, + unsigned int __B) +{ + return (__m512i) __builtin_ia32_psraqi512_mask ((__v8di) __A, __B, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_srai_epi64 (__mmask8 __U, __m512i __A, unsigned int __B) +{ + return (__m512i) __builtin_ia32_psraqi512_mask ((__v8di) __A, __B, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} +#else +#define _mm512_srai_epi64(X, C) \ + ((__m512i) __builtin_ia32_psraqi512_mask ((__v8di)(__m512i)(X), (int)(C),\ + (__v8di)(__m512i)_mm512_undefined_si512 (),\ + (__mmask8)-1)) + +#define _mm512_mask_srai_epi64(W, U, X, C) \ + ((__m512i) __builtin_ia32_psraqi512_mask ((__v8di)(__m512i)(X), (int)(C),\ + (__v8di)(__m512i)(W),\ + (__mmask8)(U))) + +#define _mm512_maskz_srai_epi64(U, X, C) \ + ((__m512i) __builtin_ia32_psraqi512_mask ((__v8di)(__m512i)(X), (int)(C),\ + (__v8di)(__m512i)_mm512_setzero_si512 (),\ + (__mmask8)(U))) +#endif + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sra_epi64 (__m512i __A, __m128i __B) +{ + return (__m512i) __builtin_ia32_psraq512_mask ((__v8di) __A, + (__v2di) __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sra_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m128i __B) +{ + return (__m512i) __builtin_ia32_psraq512_mask ((__v8di) __A, + (__v2di) __B, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sra_epi64 (__mmask8 __U, __m512i __A, __m128i __B) +{ + return (__m512i) __builtin_ia32_psraq512_mask ((__v8di) __A, + (__v2di) __B, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_slli_epi32 (__m512i __A, unsigned int __B) +{ + return (__m512i) __builtin_ia32_pslldi512_mask ((__v16si) __A, __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_slli_epi32 (__m512i __W, __mmask16 __U, __m512i __A, + unsigned int __B) +{ + return (__m512i) __builtin_ia32_pslldi512_mask ((__v16si) __A, __B, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_slli_epi32 (__mmask16 __U, __m512i __A, unsigned int __B) +{ + return (__m512i) __builtin_ia32_pslldi512_mask ((__v16si) __A, __B, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} +#else +#define _mm512_slli_epi32(X, C) \ + ((__m512i) __builtin_ia32_pslldi512_mask ((__v16si)(__m512i)(X), (int)(C),\ + (__v16si)(__m512i)_mm512_undefined_si512 (),\ + (__mmask16)-1)) + +#define _mm512_mask_slli_epi32(W, U, X, C) \ + ((__m512i) __builtin_ia32_pslldi512_mask ((__v16si)(__m512i)(X), (int)(C),\ + (__v16si)(__m512i)(W),\ + (__mmask16)(U))) + +#define _mm512_maskz_slli_epi32(U, X, C) \ + ((__m512i) __builtin_ia32_pslldi512_mask ((__v16si)(__m512i)(X), (int)(C),\ + (__v16si)(__m512i)_mm512_setzero_si512 (),\ + (__mmask16)(U))) +#endif + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sll_epi32 (__m512i __A, __m128i __B) +{ + return (__m512i) __builtin_ia32_pslld512_mask ((__v16si) __A, + (__v4si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sll_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m128i __B) +{ + return (__m512i) __builtin_ia32_pslld512_mask ((__v16si) __A, + (__v4si) __B, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sll_epi32 (__mmask16 __U, __m512i __A, __m128i __B) +{ + return (__m512i) __builtin_ia32_pslld512_mask ((__v16si) __A, + (__v4si) __B, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_srli_epi32 (__m512i __A, unsigned int __B) +{ + return (__m512i) __builtin_ia32_psrldi512_mask ((__v16si) __A, __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_srli_epi32 (__m512i __W, __mmask16 __U, + __m512i __A, unsigned int __B) +{ + return (__m512i) __builtin_ia32_psrldi512_mask ((__v16si) __A, __B, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_srli_epi32 (__mmask16 __U, __m512i __A, unsigned int __B) +{ + return (__m512i) __builtin_ia32_psrldi512_mask ((__v16si) __A, __B, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} +#else +#define _mm512_srli_epi32(X, C) \ + ((__m512i) __builtin_ia32_psrldi512_mask ((__v16si)(__m512i)(X), (int)(C),\ + (__v16si)(__m512i)_mm512_undefined_si512 (),\ + (__mmask16)-1)) + +#define _mm512_mask_srli_epi32(W, U, X, C) \ + ((__m512i) __builtin_ia32_psrldi512_mask ((__v16si)(__m512i)(X), (int)(C),\ + (__v16si)(__m512i)(W),\ + (__mmask16)(U))) + +#define _mm512_maskz_srli_epi32(U, X, C) \ + ((__m512i) __builtin_ia32_psrldi512_mask ((__v16si)(__m512i)(X), (int)(C),\ + (__v16si)(__m512i)_mm512_setzero_si512 (),\ + (__mmask16)(U))) +#endif + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_srl_epi32 (__m512i __A, __m128i __B) +{ + return (__m512i) __builtin_ia32_psrld512_mask ((__v16si) __A, + (__v4si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_srl_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m128i __B) +{ + return (__m512i) __builtin_ia32_psrld512_mask ((__v16si) __A, + (__v4si) __B, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_srl_epi32 (__mmask16 __U, __m512i __A, __m128i __B) +{ + return (__m512i) __builtin_ia32_psrld512_mask ((__v16si) __A, + (__v4si) __B, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_srai_epi32 (__m512i __A, unsigned int __B) +{ + return (__m512i) __builtin_ia32_psradi512_mask ((__v16si) __A, __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_srai_epi32 (__m512i __W, __mmask16 __U, __m512i __A, + unsigned int __B) +{ + return (__m512i) __builtin_ia32_psradi512_mask ((__v16si) __A, __B, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_srai_epi32 (__mmask16 __U, __m512i __A, unsigned int __B) +{ + return (__m512i) __builtin_ia32_psradi512_mask ((__v16si) __A, __B, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} +#else +#define _mm512_srai_epi32(X, C) \ + ((__m512i) __builtin_ia32_psradi512_mask ((__v16si)(__m512i)(X), (int)(C),\ + (__v16si)(__m512i)_mm512_undefined_si512 (),\ + (__mmask16)-1)) + +#define _mm512_mask_srai_epi32(W, U, X, C) \ + ((__m512i) __builtin_ia32_psradi512_mask ((__v16si)(__m512i)(X), (int)(C),\ + (__v16si)(__m512i)(W),\ + (__mmask16)(U))) + +#define _mm512_maskz_srai_epi32(U, X, C) \ + ((__m512i) __builtin_ia32_psradi512_mask ((__v16si)(__m512i)(X), (int)(C),\ + (__v16si)(__m512i)_mm512_setzero_si512 (),\ + (__mmask16)(U))) +#endif + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sra_epi32 (__m512i __A, __m128i __B) +{ + return (__m512i) __builtin_ia32_psrad512_mask ((__v16si) __A, + (__v4si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sra_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m128i __B) +{ + return (__m512i) __builtin_ia32_psrad512_mask ((__v16si) __A, + (__v4si) __B, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sra_epi32 (__mmask16 __U, __m512i __A, __m128i __B) +{ + return (__m512i) __builtin_ia32_psrad512_mask ((__v16si) __A, + (__v4si) __B, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_add_round_sd (__m128d __A, __m128d __B, const int __R) +{ + return (__m128d) __builtin_ia32_addsd_round ((__v2df) __A, + (__v2df) __B, + __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_add_round_ss (__m128 __A, __m128 __B, const int __R) +{ + return (__m128) __builtin_ia32_addss_round ((__v4sf) __A, + (__v4sf) __B, + __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sub_round_sd (__m128d __A, __m128d __B, const int __R) +{ + return (__m128d) __builtin_ia32_subsd_round ((__v2df) __A, + (__v2df) __B, + __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sub_round_ss (__m128 __A, __m128 __B, const int __R) +{ + return (__m128) __builtin_ia32_subss_round ((__v4sf) __A, + (__v4sf) __B, + __R); +} + +#else +#define _mm_add_round_sd(A, B, C) \ + (__m128d)__builtin_ia32_addsd_round(A, B, C) + +#define _mm_add_round_ss(A, B, C) \ + (__m128)__builtin_ia32_addss_round(A, B, C) + +#define _mm_sub_round_sd(A, B, C) \ + (__m128d)__builtin_ia32_subsd_round(A, B, C) + +#define _mm_sub_round_ss(A, B, C) \ + (__m128)__builtin_ia32_subss_round(A, B, C) +#endif + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_ternarylogic_epi64 (__m512i __A, __m512i __B, __m512i __C, const int imm) +{ + return (__m512i) __builtin_ia32_pternlogq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) __C, imm, + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_ternarylogic_epi64 (__m512i __A, __mmask8 __U, __m512i __B, + __m512i __C, const int imm) +{ + return (__m512i) __builtin_ia32_pternlogq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) __C, imm, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_ternarylogic_epi64 (__mmask8 __U, __m512i __A, __m512i __B, + __m512i __C, const int imm) +{ + return (__m512i) __builtin_ia32_pternlogq512_maskz ((__v8di) __A, + (__v8di) __B, + (__v8di) __C, + imm, (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_ternarylogic_epi32 (__m512i __A, __m512i __B, __m512i __C, const int imm) +{ + return (__m512i) __builtin_ia32_pternlogd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) __C, + imm, (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_ternarylogic_epi32 (__m512i __A, __mmask16 __U, __m512i __B, + __m512i __C, const int imm) +{ + return (__m512i) __builtin_ia32_pternlogd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) __C, + imm, (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_ternarylogic_epi32 (__mmask16 __U, __m512i __A, __m512i __B, + __m512i __C, const int imm) +{ + return (__m512i) __builtin_ia32_pternlogd512_maskz ((__v16si) __A, + (__v16si) __B, + (__v16si) __C, + imm, (__mmask16) __U); +} +#else +#define _mm512_ternarylogic_epi64(A, B, C, I) \ + ((__m512i) __builtin_ia32_pternlogq512_mask ((__v8di)(__m512i)(A), \ + (__v8di)(__m512i)(B), (__v8di)(__m512i)(C), (int)(I), (__mmask8)-1)) +#define _mm512_mask_ternarylogic_epi64(A, U, B, C, I) \ + ((__m512i) __builtin_ia32_pternlogq512_mask ((__v8di)(__m512i)(A), \ + (__v8di)(__m512i)(B), (__v8di)(__m512i)(C), (int)(I), (__mmask8)(U))) +#define _mm512_maskz_ternarylogic_epi64(U, A, B, C, I) \ + ((__m512i) __builtin_ia32_pternlogq512_maskz ((__v8di)(__m512i)(A), \ + (__v8di)(__m512i)(B), (__v8di)(__m512i)(C), (int)(I), (__mmask8)(U))) +#define _mm512_ternarylogic_epi32(A, B, C, I) \ + ((__m512i) __builtin_ia32_pternlogd512_mask ((__v16si)(__m512i)(A), \ + (__v16si)(__m512i)(B), (__v16si)(__m512i)(C), (int)(I), \ + (__mmask16)-1)) +#define _mm512_mask_ternarylogic_epi32(A, U, B, C, I) \ + ((__m512i) __builtin_ia32_pternlogd512_mask ((__v16si)(__m512i)(A), \ + (__v16si)(__m512i)(B), (__v16si)(__m512i)(C), (int)(I), \ + (__mmask16)(U))) +#define _mm512_maskz_ternarylogic_epi32(U, A, B, C, I) \ + ((__m512i) __builtin_ia32_pternlogd512_maskz ((__v16si)(__m512i)(A), \ + (__v16si)(__m512i)(B), (__v16si)(__m512i)(C), (int)(I), \ + (__mmask16)(U))) +#endif + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_rcp14_pd (__m512d __A) +{ + return (__m512d) __builtin_ia32_rcp14pd512_mask ((__v8df) __A, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_rcp14_pd (__m512d __W, __mmask8 __U, __m512d __A) +{ + return (__m512d) __builtin_ia32_rcp14pd512_mask ((__v8df) __A, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_rcp14_pd (__mmask8 __U, __m512d __A) +{ + return (__m512d) __builtin_ia32_rcp14pd512_mask ((__v8df) __A, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_rcp14_ps (__m512 __A) +{ + return (__m512) __builtin_ia32_rcp14ps512_mask ((__v16sf) __A, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_rcp14_ps (__m512 __W, __mmask16 __U, __m512 __A) +{ + return (__m512) __builtin_ia32_rcp14ps512_mask ((__v16sf) __A, + (__v16sf) __W, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_rcp14_ps (__mmask16 __U, __m512 __A) +{ + return (__m512) __builtin_ia32_rcp14ps512_mask ((__v16sf) __A, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rcp14_sd (__m128d __A, __m128d __B) +{ + return (__m128d) __builtin_ia32_rcp14sd ((__v2df) __B, + (__v2df) __A); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rcp14_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_rcp14ss ((__v4sf) __B, + (__v4sf) __A); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_rsqrt14_pd (__m512d __A) +{ + return (__m512d) __builtin_ia32_rsqrt14pd512_mask ((__v8df) __A, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_rsqrt14_pd (__m512d __W, __mmask8 __U, __m512d __A) +{ + return (__m512d) __builtin_ia32_rsqrt14pd512_mask ((__v8df) __A, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_rsqrt14_pd (__mmask8 __U, __m512d __A) +{ + return (__m512d) __builtin_ia32_rsqrt14pd512_mask ((__v8df) __A, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_rsqrt14_ps (__m512 __A) +{ + return (__m512) __builtin_ia32_rsqrt14ps512_mask ((__v16sf) __A, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_rsqrt14_ps (__m512 __W, __mmask16 __U, __m512 __A) +{ + return (__m512) __builtin_ia32_rsqrt14ps512_mask ((__v16sf) __A, + (__v16sf) __W, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_rsqrt14_ps (__mmask16 __U, __m512 __A) +{ + return (__m512) __builtin_ia32_rsqrt14ps512_mask ((__v16sf) __A, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rsqrt14_sd (__m128d __A, __m128d __B) +{ + return (__m128d) __builtin_ia32_rsqrt14sd ((__v2df) __B, + (__v2df) __A); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rsqrt14_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_rsqrt14ss ((__v4sf) __B, + (__v4sf) __A); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sqrt_round_pd (__m512d __A, const int __R) +{ + return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sqrt_round_pd (__m512d __W, __mmask8 __U, __m512d __A, + const int __R) +{ + return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A, + (__v8df) __W, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sqrt_round_pd (__mmask8 __U, __m512d __A, const int __R) +{ + return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sqrt_round_ps (__m512 __A, const int __R) +{ + return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sqrt_round_ps (__m512 __W, __mmask16 __U, __m512 __A, const int __R) +{ + return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A, + (__v16sf) __W, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sqrt_round_ps (__mmask16 __U, __m512 __A, const int __R) +{ + return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sqrt_round_sd (__m128d __A, __m128d __B, const int __R) +{ + return (__m128d) __builtin_ia32_sqrtsd_round ((__v2df) __B, + (__v2df) __A, + __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sqrt_round_ss (__m128 __A, __m128 __B, const int __R) +{ + return (__m128) __builtin_ia32_sqrtss_round ((__v4sf) __B, + (__v4sf) __A, + __R); +} +#else +#define _mm512_sqrt_round_pd(A, C) \ + (__m512d)__builtin_ia32_sqrtpd512_mask(A, (__v8df)_mm512_undefined_pd(), -1, C) + +#define _mm512_mask_sqrt_round_pd(W, U, A, C) \ + (__m512d)__builtin_ia32_sqrtpd512_mask(A, W, U, C) + +#define _mm512_maskz_sqrt_round_pd(U, A, C) \ + (__m512d)__builtin_ia32_sqrtpd512_mask(A, (__v8df)_mm512_setzero_pd(), U, C) + +#define _mm512_sqrt_round_ps(A, C) \ + (__m512)__builtin_ia32_sqrtps512_mask(A, (__v16sf)_mm512_undefined_ps(), -1, C) + +#define _mm512_mask_sqrt_round_ps(W, U, A, C) \ + (__m512)__builtin_ia32_sqrtps512_mask(A, W, U, C) + +#define _mm512_maskz_sqrt_round_ps(U, A, C) \ + (__m512)__builtin_ia32_sqrtps512_mask(A, (__v16sf)_mm512_setzero_ps(), U, C) + +#define _mm_sqrt_round_sd(A, B, C) \ + (__m128d)__builtin_ia32_sqrtsd_round(A, B, C) + +#define _mm_sqrt_round_ss(A, B, C) \ + (__m128)__builtin_ia32_sqrtss_round(A, B, C) +#endif + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepi8_epi32 (__m128i __A) +{ + return (__m512i) __builtin_ia32_pmovsxbd512_mask ((__v16qi) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepi8_epi32 (__m512i __W, __mmask16 __U, __m128i __A) +{ + return (__m512i) __builtin_ia32_pmovsxbd512_mask ((__v16qi) __A, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepi8_epi32 (__mmask16 __U, __m128i __A) +{ + return (__m512i) __builtin_ia32_pmovsxbd512_mask ((__v16qi) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepi8_epi64 (__m128i __A) +{ + return (__m512i) __builtin_ia32_pmovsxbq512_mask ((__v16qi) __A, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepi8_epi64 (__m512i __W, __mmask8 __U, __m128i __A) +{ + return (__m512i) __builtin_ia32_pmovsxbq512_mask ((__v16qi) __A, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepi8_epi64 (__mmask8 __U, __m128i __A) +{ + return (__m512i) __builtin_ia32_pmovsxbq512_mask ((__v16qi) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepi16_epi32 (__m256i __A) +{ + return (__m512i) __builtin_ia32_pmovsxwd512_mask ((__v16hi) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepi16_epi32 (__m512i __W, __mmask16 __U, __m256i __A) +{ + return (__m512i) __builtin_ia32_pmovsxwd512_mask ((__v16hi) __A, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepi16_epi32 (__mmask16 __U, __m256i __A) +{ + return (__m512i) __builtin_ia32_pmovsxwd512_mask ((__v16hi) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepi16_epi64 (__m128i __A) +{ + return (__m512i) __builtin_ia32_pmovsxwq512_mask ((__v8hi) __A, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepi16_epi64 (__m512i __W, __mmask8 __U, __m128i __A) +{ + return (__m512i) __builtin_ia32_pmovsxwq512_mask ((__v8hi) __A, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepi16_epi64 (__mmask8 __U, __m128i __A) +{ + return (__m512i) __builtin_ia32_pmovsxwq512_mask ((__v8hi) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepi32_epi64 (__m256i __X) +{ + return (__m512i) __builtin_ia32_pmovsxdq512_mask ((__v8si) __X, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepi32_epi64 (__m512i __W, __mmask8 __U, __m256i __X) +{ + return (__m512i) __builtin_ia32_pmovsxdq512_mask ((__v8si) __X, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepi32_epi64 (__mmask8 __U, __m256i __X) +{ + return (__m512i) __builtin_ia32_pmovsxdq512_mask ((__v8si) __X, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepu8_epi32 (__m128i __A) +{ + return (__m512i) __builtin_ia32_pmovzxbd512_mask ((__v16qi) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepu8_epi32 (__m512i __W, __mmask16 __U, __m128i __A) +{ + return (__m512i) __builtin_ia32_pmovzxbd512_mask ((__v16qi) __A, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepu8_epi32 (__mmask16 __U, __m128i __A) +{ + return (__m512i) __builtin_ia32_pmovzxbd512_mask ((__v16qi) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepu8_epi64 (__m128i __A) +{ + return (__m512i) __builtin_ia32_pmovzxbq512_mask ((__v16qi) __A, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepu8_epi64 (__m512i __W, __mmask8 __U, __m128i __A) +{ + return (__m512i) __builtin_ia32_pmovzxbq512_mask ((__v16qi) __A, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepu8_epi64 (__mmask8 __U, __m128i __A) +{ + return (__m512i) __builtin_ia32_pmovzxbq512_mask ((__v16qi) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepu16_epi32 (__m256i __A) +{ + return (__m512i) __builtin_ia32_pmovzxwd512_mask ((__v16hi) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepu16_epi32 (__m512i __W, __mmask16 __U, __m256i __A) +{ + return (__m512i) __builtin_ia32_pmovzxwd512_mask ((__v16hi) __A, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepu16_epi32 (__mmask16 __U, __m256i __A) +{ + return (__m512i) __builtin_ia32_pmovzxwd512_mask ((__v16hi) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepu16_epi64 (__m128i __A) +{ + return (__m512i) __builtin_ia32_pmovzxwq512_mask ((__v8hi) __A, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepu16_epi64 (__m512i __W, __mmask8 __U, __m128i __A) +{ + return (__m512i) __builtin_ia32_pmovzxwq512_mask ((__v8hi) __A, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepu16_epi64 (__mmask8 __U, __m128i __A) +{ + return (__m512i) __builtin_ia32_pmovzxwq512_mask ((__v8hi) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepu32_epi64 (__m256i __X) +{ + return (__m512i) __builtin_ia32_pmovzxdq512_mask ((__v8si) __X, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepu32_epi64 (__m512i __W, __mmask8 __U, __m256i __X) +{ + return (__m512i) __builtin_ia32_pmovzxdq512_mask ((__v8si) __X, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepu32_epi64 (__mmask8 __U, __m256i __X) +{ + return (__m512i) __builtin_ia32_pmovzxdq512_mask ((__v8si) __X, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_add_round_pd (__m512d __A, __m512d __B, const int __R) +{ + return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_add_round_pd (__m512d __W, __mmask8 __U, __m512d __A, + __m512d __B, const int __R) +{ + return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __W, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_add_round_pd (__mmask8 __U, __m512d __A, __m512d __B, + const int __R) +{ + return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_add_round_ps (__m512 __A, __m512 __B, const int __R) +{ + return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_add_round_ps (__m512 __W, __mmask16 __U, __m512 __A, + __m512 __B, const int __R) +{ + return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __W, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_add_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R) +{ + return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sub_round_pd (__m512d __A, __m512d __B, const int __R) +{ + return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sub_round_pd (__m512d __W, __mmask8 __U, __m512d __A, + __m512d __B, const int __R) +{ + return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __W, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sub_round_pd (__mmask8 __U, __m512d __A, __m512d __B, + const int __R) +{ + return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sub_round_ps (__m512 __A, __m512 __B, const int __R) +{ + return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sub_round_ps (__m512 __W, __mmask16 __U, __m512 __A, + __m512 __B, const int __R) +{ + return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __W, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sub_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R) +{ + return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, __R); +} +#else +#define _mm512_add_round_pd(A, B, C) \ + (__m512d)__builtin_ia32_addpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C) + +#define _mm512_mask_add_round_pd(W, U, A, B, C) \ + (__m512d)__builtin_ia32_addpd512_mask(A, B, W, U, C) + +#define _mm512_maskz_add_round_pd(U, A, B, C) \ + (__m512d)__builtin_ia32_addpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C) + +#define _mm512_add_round_ps(A, B, C) \ + (__m512)__builtin_ia32_addps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C) + +#define _mm512_mask_add_round_ps(W, U, A, B, C) \ + (__m512)__builtin_ia32_addps512_mask(A, B, W, U, C) + +#define _mm512_maskz_add_round_ps(U, A, B, C) \ + (__m512)__builtin_ia32_addps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C) + +#define _mm512_sub_round_pd(A, B, C) \ + (__m512d)__builtin_ia32_subpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C) + +#define _mm512_mask_sub_round_pd(W, U, A, B, C) \ + (__m512d)__builtin_ia32_subpd512_mask(A, B, W, U, C) + +#define _mm512_maskz_sub_round_pd(U, A, B, C) \ + (__m512d)__builtin_ia32_subpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C) + +#define _mm512_sub_round_ps(A, B, C) \ + (__m512)__builtin_ia32_subps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C) + +#define _mm512_mask_sub_round_ps(W, U, A, B, C) \ + (__m512)__builtin_ia32_subps512_mask(A, B, W, U, C) + +#define _mm512_maskz_sub_round_ps(U, A, B, C) \ + (__m512)__builtin_ia32_subps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C) +#endif + +#ifdef __OPTIMIZE__ +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mul_round_pd (__m512d __A, __m512d __B, const int __R) +{ + return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_mul_round_pd (__m512d __W, __mmask8 __U, __m512d __A, + __m512d __B, const int __R) +{ + return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __W, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_mul_round_pd (__mmask8 __U, __m512d __A, __m512d __B, + const int __R) +{ + return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mul_round_ps (__m512 __A, __m512 __B, const int __R) +{ + return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_mul_round_ps (__m512 __W, __mmask16 __U, __m512 __A, + __m512 __B, const int __R) +{ + return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __W, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_mul_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R) +{ + return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_div_round_pd (__m512d __M, __m512d __V, const int __R) +{ + return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M, + (__v8df) __V, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_div_round_pd (__m512d __W, __mmask8 __U, __m512d __M, + __m512d __V, const int __R) +{ + return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M, + (__v8df) __V, + (__v8df) __W, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_div_round_pd (__mmask8 __U, __m512d __M, __m512d __V, + const int __R) +{ + return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M, + (__v8df) __V, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_div_round_ps (__m512 __A, __m512 __B, const int __R) +{ + return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_div_round_ps (__m512 __W, __mmask16 __U, __m512 __A, + __m512 __B, const int __R) +{ + return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __W, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_div_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R) +{ + return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mul_round_sd (__m128d __A, __m128d __B, const int __R) +{ + return (__m128d) __builtin_ia32_mulsd_round ((__v2df) __A, + (__v2df) __B, + __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mul_round_ss (__m128 __A, __m128 __B, const int __R) +{ + return (__m128) __builtin_ia32_mulss_round ((__v4sf) __A, + (__v4sf) __B, + __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_div_round_sd (__m128d __A, __m128d __B, const int __R) +{ + return (__m128d) __builtin_ia32_divsd_round ((__v2df) __A, + (__v2df) __B, + __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_div_round_ss (__m128 __A, __m128 __B, const int __R) +{ + return (__m128) __builtin_ia32_divss_round ((__v4sf) __A, + (__v4sf) __B, + __R); +} + +#else +#define _mm512_mul_round_pd(A, B, C) \ + (__m512d)__builtin_ia32_mulpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C) + +#define _mm512_mask_mul_round_pd(W, U, A, B, C) \ + (__m512d)__builtin_ia32_mulpd512_mask(A, B, W, U, C) + +#define _mm512_maskz_mul_round_pd(U, A, B, C) \ + (__m512d)__builtin_ia32_mulpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C) + +#define _mm512_mul_round_ps(A, B, C) \ + (__m512)__builtin_ia32_mulps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C) + +#define _mm512_mask_mul_round_ps(W, U, A, B, C) \ + (__m512)__builtin_ia32_mulps512_mask(A, B, W, U, C) + +#define _mm512_maskz_mul_round_ps(U, A, B, C) \ + (__m512)__builtin_ia32_mulps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C) + +#define _mm512_div_round_pd(A, B, C) \ + (__m512d)__builtin_ia32_divpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C) + +#define _mm512_mask_div_round_pd(W, U, A, B, C) \ + (__m512d)__builtin_ia32_divpd512_mask(A, B, W, U, C) + +#define _mm512_maskz_div_round_pd(U, A, B, C) \ + (__m512d)__builtin_ia32_divpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C) + +#define _mm512_div_round_ps(A, B, C) \ + (__m512)__builtin_ia32_divps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C) + +#define _mm512_mask_div_round_ps(W, U, A, B, C) \ + (__m512)__builtin_ia32_divps512_mask(A, B, W, U, C) + +#define _mm512_maskz_div_round_ps(U, A, B, C) \ + (__m512)__builtin_ia32_divps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C) + +#define _mm_mul_round_sd(A, B, C) \ + (__m128d)__builtin_ia32_mulsd_round(A, B, C) + +#define _mm_mul_round_ss(A, B, C) \ + (__m128)__builtin_ia32_mulss_round(A, B, C) + +#define _mm_div_round_sd(A, B, C) \ + (__m128d)__builtin_ia32_divsd_round(A, B, C) + +#define _mm_div_round_ss(A, B, C) \ + (__m128)__builtin_ia32_divss_round(A, B, C) +#endif + +#ifdef __OPTIMIZE__ +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_max_round_pd (__m512d __A, __m512d __B, const int __R) +{ + return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_max_round_pd (__m512d __W, __mmask8 __U, __m512d __A, + __m512d __B, const int __R) +{ + return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __W, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_max_round_pd (__mmask8 __U, __m512d __A, __m512d __B, + const int __R) +{ + return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_max_round_ps (__m512 __A, __m512 __B, const int __R) +{ + return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_max_round_ps (__m512 __W, __mmask16 __U, __m512 __A, + __m512 __B, const int __R) +{ + return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __W, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_max_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R) +{ + return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_min_round_pd (__m512d __A, __m512d __B, const int __R) +{ + return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_min_round_pd (__m512d __W, __mmask8 __U, __m512d __A, + __m512d __B, const int __R) +{ + return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __W, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_min_round_pd (__mmask8 __U, __m512d __A, __m512d __B, + const int __R) +{ + return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_min_round_ps (__m512 __A, __m512 __B, const int __R) +{ + return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_min_round_ps (__m512 __W, __mmask16 __U, __m512 __A, + __m512 __B, const int __R) +{ + return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __W, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_min_round_ps (__mmask16 __U, __m512 __A, __m512 __B, const int __R) +{ + return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, __R); +} +#else +#define _mm512_max_round_pd(A, B, R) \ + (__m512d)__builtin_ia32_maxpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, R) + +#define _mm512_mask_max_round_pd(W, U, A, B, R) \ + (__m512d)__builtin_ia32_maxpd512_mask(A, B, W, U, R) + +#define _mm512_maskz_max_round_pd(U, A, B, R) \ + (__m512d)__builtin_ia32_maxpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, R) + +#define _mm512_max_round_ps(A, B, R) \ + (__m512)__builtin_ia32_maxps512_mask(A, B, (__v16sf)_mm512_undefined_pd(), -1, R) + +#define _mm512_mask_max_round_ps(W, U, A, B, R) \ + (__m512)__builtin_ia32_maxps512_mask(A, B, W, U, R) + +#define _mm512_maskz_max_round_ps(U, A, B, R) \ + (__m512)__builtin_ia32_maxps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, R) + +#define _mm512_min_round_pd(A, B, R) \ + (__m512d)__builtin_ia32_minpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, R) + +#define _mm512_mask_min_round_pd(W, U, A, B, R) \ + (__m512d)__builtin_ia32_minpd512_mask(A, B, W, U, R) + +#define _mm512_maskz_min_round_pd(U, A, B, R) \ + (__m512d)__builtin_ia32_minpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, R) + +#define _mm512_min_round_ps(A, B, R) \ + (__m512)__builtin_ia32_minps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, R) + +#define _mm512_mask_min_round_ps(W, U, A, B, R) \ + (__m512)__builtin_ia32_minps512_mask(A, B, W, U, R) + +#define _mm512_maskz_min_round_ps(U, A, B, R) \ + (__m512)__builtin_ia32_minps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, R) +#endif + +#ifdef __OPTIMIZE__ +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_scalef_round_pd (__m512d __A, __m512d __B, const int __R) +{ + return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_scalef_round_pd (__m512d __W, __mmask8 __U, __m512d __A, + __m512d __B, const int __R) +{ + return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __W, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_scalef_round_pd (__mmask8 __U, __m512d __A, __m512d __B, + const int __R) +{ + return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_scalef_round_ps (__m512 __A, __m512 __B, const int __R) +{ + return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_scalef_round_ps (__m512 __W, __mmask16 __U, __m512 __A, + __m512 __B, const int __R) +{ + return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __W, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_scalef_round_ps (__mmask16 __U, __m512 __A, __m512 __B, + const int __R) +{ + return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_scalef_round_sd (__m128d __A, __m128d __B, const int __R) +{ + return (__m128d) __builtin_ia32_scalefsd_round ((__v2df) __A, + (__v2df) __B, + __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_scalef_round_ss (__m128 __A, __m128 __B, const int __R) +{ + return (__m128) __builtin_ia32_scalefss_round ((__v4sf) __A, + (__v4sf) __B, + __R); +} +#else +#define _mm512_scalef_round_pd(A, B, C) \ + (__m512d)__builtin_ia32_scalefpd512_mask(A, B, (__v8df)_mm512_undefined_pd(), -1, C) + +#define _mm512_mask_scalef_round_pd(W, U, A, B, C) \ + (__m512d)__builtin_ia32_scalefpd512_mask(A, B, W, U, C) + +#define _mm512_maskz_scalef_round_pd(U, A, B, C) \ + (__m512d)__builtin_ia32_scalefpd512_mask(A, B, (__v8df)_mm512_setzero_pd(), U, C) + +#define _mm512_scalef_round_ps(A, B, C) \ + (__m512)__builtin_ia32_scalefps512_mask(A, B, (__v16sf)_mm512_undefined_ps(), -1, C) + +#define _mm512_mask_scalef_round_ps(W, U, A, B, C) \ + (__m512)__builtin_ia32_scalefps512_mask(A, B, W, U, C) + +#define _mm512_maskz_scalef_round_ps(U, A, B, C) \ + (__m512)__builtin_ia32_scalefps512_mask(A, B, (__v16sf)_mm512_setzero_ps(), U, C) + +#define _mm_scalef_round_sd(A, B, C) \ + (__m128d)__builtin_ia32_scalefsd_round(A, B, C) + +#define _mm_scalef_round_ss(A, B, C) \ + (__m128)__builtin_ia32_scalefss_round(A, B, C) +#endif + +#ifdef __OPTIMIZE__ +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmadd_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmadd_round_pd (__m512d __A, __mmask8 __U, __m512d __B, + __m512d __C, const int __R) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmadd_round_pd (__m512d __A, __m512d __B, __m512d __C, + __mmask8 __U, const int __R) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_mask3 ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmadd_round_pd (__mmask8 __U, __m512d __A, __m512d __B, + __m512d __C, const int __R) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_maskz ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmadd_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R) +{ + return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmadd_round_ps (__m512 __A, __mmask16 __U, __m512 __B, + __m512 __C, const int __R) +{ + return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmadd_round_ps (__m512 __A, __m512 __B, __m512 __C, + __mmask16 __U, const int __R) +{ + return (__m512) __builtin_ia32_vfmaddps512_mask3 ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmadd_round_ps (__mmask16 __U, __m512 __A, __m512 __B, + __m512 __C, const int __R) +{ + return (__m512) __builtin_ia32_vfmaddps512_maskz ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmsub_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A, + (__v8df) __B, + -(__v8df) __C, + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmsub_round_pd (__m512d __A, __mmask8 __U, __m512d __B, + __m512d __C, const int __R) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A, + (__v8df) __B, + -(__v8df) __C, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmsub_round_pd (__m512d __A, __m512d __B, __m512d __C, + __mmask8 __U, const int __R) +{ + return (__m512d) __builtin_ia32_vfmsubpd512_mask3 ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmsub_round_pd (__mmask8 __U, __m512d __A, __m512d __B, + __m512d __C, const int __R) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_maskz ((__v8df) __A, + (__v8df) __B, + -(__v8df) __C, + (__mmask8) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmsub_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R) +{ + return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A, + (__v16sf) __B, + -(__v16sf) __C, + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmsub_round_ps (__m512 __A, __mmask16 __U, __m512 __B, + __m512 __C, const int __R) +{ + return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A, + (__v16sf) __B, + -(__v16sf) __C, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmsub_round_ps (__m512 __A, __m512 __B, __m512 __C, + __mmask16 __U, const int __R) +{ + return (__m512) __builtin_ia32_vfmsubps512_mask3 ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B, + __m512 __C, const int __R) +{ + return (__m512) __builtin_ia32_vfmaddps512_maskz ((__v16sf) __A, + (__v16sf) __B, + -(__v16sf) __C, + (__mmask16) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmaddsub_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R) +{ + return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmaddsub_round_pd (__m512d __A, __mmask8 __U, __m512d __B, + __m512d __C, const int __R) +{ + return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmaddsub_round_pd (__m512d __A, __m512d __B, __m512d __C, + __mmask8 __U, const int __R) +{ + return (__m512d) __builtin_ia32_vfmaddsubpd512_mask3 ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmaddsub_round_pd (__mmask8 __U, __m512d __A, __m512d __B, + __m512d __C, const int __R) +{ + return (__m512d) __builtin_ia32_vfmaddsubpd512_maskz ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmaddsub_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R) +{ + return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmaddsub_round_ps (__m512 __A, __mmask16 __U, __m512 __B, + __m512 __C, const int __R) +{ + return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmaddsub_round_ps (__m512 __A, __m512 __B, __m512 __C, + __mmask16 __U, const int __R) +{ + return (__m512) __builtin_ia32_vfmaddsubps512_mask3 ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmaddsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B, + __m512 __C, const int __R) +{ + return (__m512) __builtin_ia32_vfmaddsubps512_maskz ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmsubadd_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R) +{ + return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A, + (__v8df) __B, + -(__v8df) __C, + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmsubadd_round_pd (__m512d __A, __mmask8 __U, __m512d __B, + __m512d __C, const int __R) +{ + return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A, + (__v8df) __B, + -(__v8df) __C, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmsubadd_round_pd (__m512d __A, __m512d __B, __m512d __C, + __mmask8 __U, const int __R) +{ + return (__m512d) __builtin_ia32_vfmsubaddpd512_mask3 ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmsubadd_round_pd (__mmask8 __U, __m512d __A, __m512d __B, + __m512d __C, const int __R) +{ + return (__m512d) __builtin_ia32_vfmaddsubpd512_maskz ((__v8df) __A, + (__v8df) __B, + -(__v8df) __C, + (__mmask8) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmsubadd_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R) +{ + return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A, + (__v16sf) __B, + -(__v16sf) __C, + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmsubadd_round_ps (__m512 __A, __mmask16 __U, __m512 __B, + __m512 __C, const int __R) +{ + return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A, + (__v16sf) __B, + -(__v16sf) __C, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmsubadd_round_ps (__m512 __A, __m512 __B, __m512 __C, + __mmask16 __U, const int __R) +{ + return (__m512) __builtin_ia32_vfmsubaddps512_mask3 ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmsubadd_round_ps (__mmask16 __U, __m512 __A, __m512 __B, + __m512 __C, const int __R) +{ + return (__m512) __builtin_ia32_vfmaddsubps512_maskz ((__v16sf) __A, + (__v16sf) __B, + -(__v16sf) __C, + (__mmask16) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fnmadd_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_mask (-(__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fnmadd_round_pd (__m512d __A, __mmask8 __U, __m512d __B, + __m512d __C, const int __R) +{ + return (__m512d) __builtin_ia32_vfnmaddpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fnmadd_round_pd (__m512d __A, __m512d __B, __m512d __C, + __mmask8 __U, const int __R) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_mask3 (-(__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fnmadd_round_pd (__mmask8 __U, __m512d __A, __m512d __B, + __m512d __C, const int __R) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_maskz (-(__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fnmadd_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R) +{ + return (__m512) __builtin_ia32_vfmaddps512_mask (-(__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fnmadd_round_ps (__m512 __A, __mmask16 __U, __m512 __B, + __m512 __C, const int __R) +{ + return (__m512) __builtin_ia32_vfnmaddps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fnmadd_round_ps (__m512 __A, __m512 __B, __m512 __C, + __mmask16 __U, const int __R) +{ + return (__m512) __builtin_ia32_vfmaddps512_mask3 (-(__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fnmadd_round_ps (__mmask16 __U, __m512 __A, __m512 __B, + __m512 __C, const int __R) +{ + return (__m512) __builtin_ia32_vfmaddps512_maskz (-(__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fnmsub_round_pd (__m512d __A, __m512d __B, __m512d __C, const int __R) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_mask (-(__v8df) __A, + (__v8df) __B, + -(__v8df) __C, + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fnmsub_round_pd (__m512d __A, __mmask8 __U, __m512d __B, + __m512d __C, const int __R) +{ + return (__m512d) __builtin_ia32_vfnmsubpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fnmsub_round_pd (__m512d __A, __m512d __B, __m512d __C, + __mmask8 __U, const int __R) +{ + return (__m512d) __builtin_ia32_vfnmsubpd512_mask3 ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fnmsub_round_pd (__mmask8 __U, __m512d __A, __m512d __B, + __m512d __C, const int __R) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_maskz (-(__v8df) __A, + (__v8df) __B, + -(__v8df) __C, + (__mmask8) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fnmsub_round_ps (__m512 __A, __m512 __B, __m512 __C, const int __R) +{ + return (__m512) __builtin_ia32_vfmaddps512_mask (-(__v16sf) __A, + (__v16sf) __B, + -(__v16sf) __C, + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fnmsub_round_ps (__m512 __A, __mmask16 __U, __m512 __B, + __m512 __C, const int __R) +{ + return (__m512) __builtin_ia32_vfnmsubps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fnmsub_round_ps (__m512 __A, __m512 __B, __m512 __C, + __mmask16 __U, const int __R) +{ + return (__m512) __builtin_ia32_vfnmsubps512_mask3 ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fnmsub_round_ps (__mmask16 __U, __m512 __A, __m512 __B, + __m512 __C, const int __R) +{ + return (__m512) __builtin_ia32_vfmaddps512_maskz (-(__v16sf) __A, + (__v16sf) __B, + -(__v16sf) __C, + (__mmask16) __U, __R); +} +#else +#define _mm512_fmadd_round_pd(A, B, C, R) \ + (__m512d)__builtin_ia32_vfmaddpd512_mask(A, B, C, -1, R) + +#define _mm512_mask_fmadd_round_pd(A, U, B, C, R) \ + (__m512d)__builtin_ia32_vfmaddpd512_mask(A, B, C, U, R) + +#define _mm512_mask3_fmadd_round_pd(A, B, C, U, R) \ + (__m512d)__builtin_ia32_vfmaddpd512_mask3(A, B, C, U, R) + +#define _mm512_maskz_fmadd_round_pd(U, A, B, C, R) \ + (__m512d)__builtin_ia32_vfmaddpd512_maskz(A, B, C, U, R) + +#define _mm512_fmadd_round_ps(A, B, C, R) \ + (__m512)__builtin_ia32_vfmaddps512_mask(A, B, C, -1, R) + +#define _mm512_mask_fmadd_round_ps(A, U, B, C, R) \ + (__m512)__builtin_ia32_vfmaddps512_mask(A, B, C, U, R) + +#define _mm512_mask3_fmadd_round_ps(A, B, C, U, R) \ + (__m512)__builtin_ia32_vfmaddps512_mask3(A, B, C, U, R) + +#define _mm512_maskz_fmadd_round_ps(U, A, B, C, R) \ + (__m512)__builtin_ia32_vfmaddps512_maskz(A, B, C, U, R) + +#define _mm512_fmsub_round_pd(A, B, C, R) \ + (__m512d)__builtin_ia32_vfmaddpd512_mask(A, B, -(C), -1, R) + +#define _mm512_mask_fmsub_round_pd(A, U, B, C, R) \ + (__m512d)__builtin_ia32_vfmaddpd512_mask(A, B, -(C), U, R) + +#define _mm512_mask3_fmsub_round_pd(A, B, C, U, R) \ + (__m512d)__builtin_ia32_vfmsubpd512_mask3(A, B, C, U, R) + +#define _mm512_maskz_fmsub_round_pd(U, A, B, C, R) \ + (__m512d)__builtin_ia32_vfmaddpd512_maskz(A, B, -(C), U, R) + +#define _mm512_fmsub_round_ps(A, B, C, R) \ + (__m512)__builtin_ia32_vfmaddps512_mask(A, B, -(C), -1, R) + +#define _mm512_mask_fmsub_round_ps(A, U, B, C, R) \ + (__m512)__builtin_ia32_vfmaddps512_mask(A, B, -(C), U, R) + +#define _mm512_mask3_fmsub_round_ps(A, B, C, U, R) \ + (__m512)__builtin_ia32_vfmsubps512_mask3(A, B, C, U, R) + +#define _mm512_maskz_fmsub_round_ps(U, A, B, C, R) \ + (__m512)__builtin_ia32_vfmaddps512_maskz(A, B, -(C), U, R) + +#define _mm512_fmaddsub_round_pd(A, B, C, R) \ + (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, C, -1, R) + +#define _mm512_mask_fmaddsub_round_pd(A, U, B, C, R) \ + (__m512d)__builtin_ia32_vfmaddpd512_mask(A, B, C, U, R) + +#define _mm512_mask3_fmaddsub_round_pd(A, B, C, U, R) \ + (__m512d)__builtin_ia32_vfmaddsubpd512_mask3(A, B, C, U, R) + +#define _mm512_maskz_fmaddsub_round_pd(U, A, B, C, R) \ + (__m512d)__builtin_ia32_vfmaddsubpd512_maskz(A, B, C, U, R) + +#define _mm512_fmaddsub_round_ps(A, B, C, R) \ + (__m512)__builtin_ia32_vfmaddsubps512_mask(A, B, C, -1, R) + +#define _mm512_mask_fmaddsub_round_ps(A, U, B, C, R) \ + (__m512)__builtin_ia32_vfmaddsubps512_mask(A, B, C, U, R) + +#define _mm512_mask3_fmaddsub_round_ps(A, B, C, U, R) \ + (__m512)__builtin_ia32_vfmaddsubps512_mask3(A, B, C, U, R) + +#define _mm512_maskz_fmaddsub_round_ps(U, A, B, C, R) \ + (__m512)__builtin_ia32_vfmaddsubps512_maskz(A, B, C, U, R) + +#define _mm512_fmsubadd_round_pd(A, B, C, R) \ + (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, -(C), -1, R) + +#define _mm512_mask_fmsubadd_round_pd(A, U, B, C, R) \ + (__m512d)__builtin_ia32_vfmaddsubpd512_mask(A, B, -(C), U, R) + +#define _mm512_mask3_fmsubadd_round_pd(A, B, C, U, R) \ + (__m512d)__builtin_ia32_vfmsubaddpd512_mask3(A, B, C, U, R) + +#define _mm512_maskz_fmsubadd_round_pd(U, A, B, C, R) \ + (__m512d)__builtin_ia32_vfmaddsubpd512_maskz(A, B, -(C), U, R) + +#define _mm512_fmsubadd_round_ps(A, B, C, R) \ + (__m512)__builtin_ia32_vfmaddsubps512_mask(A, B, -(C), -1, R) + +#define _mm512_mask_fmsubadd_round_ps(A, U, B, C, R) \ + (__m512)__builtin_ia32_vfmaddsubps512_mask(A, B, -(C), U, R) + +#define _mm512_mask3_fmsubadd_round_ps(A, B, C, U, R) \ + (__m512)__builtin_ia32_vfmsubaddps512_mask3(A, B, C, U, R) + +#define _mm512_maskz_fmsubadd_round_ps(U, A, B, C, R) \ + (__m512)__builtin_ia32_vfmaddsubps512_maskz(A, B, -(C), U, R) + +#define _mm512_fnmadd_round_pd(A, B, C, R) \ + (__m512d)__builtin_ia32_vfmaddpd512_mask(-(A), B, C, -1, R) + +#define _mm512_mask_fnmadd_round_pd(A, U, B, C, R) \ + (__m512d)__builtin_ia32_vfnmaddpd512_mask(-(A), B, C, U, R) + +#define _mm512_mask3_fnmadd_round_pd(A, B, C, U, R) \ + (__m512d)__builtin_ia32_vfmaddpd512_mask3(-(A), B, C, U, R) + +#define _mm512_maskz_fnmadd_round_pd(U, A, B, C, R) \ + (__m512d)__builtin_ia32_vfmaddpd512_maskz(-(A), B, C, U, R) + +#define _mm512_fnmadd_round_ps(A, B, C, R) \ + (__m512)__builtin_ia32_vfmaddps512_mask(-(A), B, C, -1, R) + +#define _mm512_mask_fnmadd_round_ps(A, U, B, C, R) \ + (__m512)__builtin_ia32_vfnmaddps512_mask(-(A), B, C, U, R) + +#define _mm512_mask3_fnmadd_round_ps(A, B, C, U, R) \ + (__m512)__builtin_ia32_vfmaddps512_mask3(-(A), B, C, U, R) + +#define _mm512_maskz_fnmadd_round_ps(U, A, B, C, R) \ + (__m512)__builtin_ia32_vfmaddps512_maskz(-(A), B, C, U, R) + +#define _mm512_fnmsub_round_pd(A, B, C, R) \ + (__m512d)__builtin_ia32_vfmaddpd512_mask(-(A), B, -(C), -1, R) + +#define _mm512_mask_fnmsub_round_pd(A, U, B, C, R) \ + (__m512d)__builtin_ia32_vfnmsubpd512_mask(A, B, C, U, R) + +#define _mm512_mask3_fnmsub_round_pd(A, B, C, U, R) \ + (__m512d)__builtin_ia32_vfnmsubpd512_mask3(A, B, C, U, R) + +#define _mm512_maskz_fnmsub_round_pd(U, A, B, C, R) \ + (__m512d)__builtin_ia32_vfmaddpd512_maskz(-(A), B, -(C), U, R) + +#define _mm512_fnmsub_round_ps(A, B, C, R) \ + (__m512)__builtin_ia32_vfmaddps512_mask(-(A), B, -(C), -1, R) + +#define _mm512_mask_fnmsub_round_ps(A, U, B, C, R) \ + (__m512)__builtin_ia32_vfnmsubps512_mask(A, B, C, U, R) + +#define _mm512_mask3_fnmsub_round_ps(A, B, C, U, R) \ + (__m512)__builtin_ia32_vfnmsubps512_mask3(A, B, C, U, R) + +#define _mm512_maskz_fnmsub_round_ps(U, A, B, C, R) \ + (__m512)__builtin_ia32_vfmaddps512_maskz(-(A), B, -(C), U, R) +#endif + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_abs_epi64 (__m512i __A) +{ + return (__m512i) __builtin_ia32_pabsq512_mask ((__v8di) __A, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_abs_epi64 (__m512i __W, __mmask8 __U, __m512i __A) +{ + return (__m512i) __builtin_ia32_pabsq512_mask ((__v8di) __A, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_abs_epi64 (__mmask8 __U, __m512i __A) +{ + return (__m512i) __builtin_ia32_pabsq512_mask ((__v8di) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_abs_epi32 (__m512i __A) +{ + return (__m512i) __builtin_ia32_pabsd512_mask ((__v16si) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_abs_epi32 (__m512i __W, __mmask16 __U, __m512i __A) +{ + return (__m512i) __builtin_ia32_pabsd512_mask ((__v16si) __A, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_abs_epi32 (__mmask16 __U, __m512i __A) +{ + return (__m512i) __builtin_ia32_pabsd512_mask ((__v16si) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_broadcastss_ps (__m128 __A) +{ + return (__m512) __builtin_ia32_broadcastss512 ((__v4sf) __A, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_broadcastss_ps (__m512 __O, __mmask16 __M, __m128 __A) +{ + return (__m512) __builtin_ia32_broadcastss512 ((__v4sf) __A, + (__v16sf) __O, __M); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_broadcastss_ps (__mmask16 __M, __m128 __A) +{ + return (__m512) __builtin_ia32_broadcastss512 ((__v4sf) __A, + (__v16sf) + _mm512_setzero_ps (), + __M); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_broadcastsd_pd (__m128d __A) +{ + return (__m512d) __builtin_ia32_broadcastsd512 ((__v2df) __A, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_broadcastsd_pd (__m512d __O, __mmask8 __M, __m128d __A) +{ + return (__m512d) __builtin_ia32_broadcastsd512 ((__v2df) __A, + (__v8df) __O, __M); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_broadcastsd_pd (__mmask8 __M, __m128d __A) +{ + return (__m512d) __builtin_ia32_broadcastsd512 ((__v2df) __A, + (__v8df) + _mm512_setzero_pd (), + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_broadcastd_epi32 (__m128i __A) +{ + return (__m512i) __builtin_ia32_pbroadcastd512 ((__v4si) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_broadcastd_epi32 (__m512i __O, __mmask16 __M, __m128i __A) +{ + return (__m512i) __builtin_ia32_pbroadcastd512 ((__v4si) __A, + (__v16si) __O, __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_broadcastd_epi32 (__mmask16 __M, __m128i __A) +{ + return (__m512i) __builtin_ia32_pbroadcastd512 ((__v4si) __A, + (__v16si) + _mm512_setzero_si512 (), + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set1_epi32 (int __A) +{ + return (__m512i) __builtin_ia32_pbroadcastd512_gpr_mask (__A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16)(-1)); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_set1_epi32 (__m512i __O, __mmask16 __M, int __A) +{ + return (__m512i) __builtin_ia32_pbroadcastd512_gpr_mask (__A, (__v16si) __O, + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_set1_epi32 (__mmask16 __M, int __A) +{ + return (__m512i) + __builtin_ia32_pbroadcastd512_gpr_mask (__A, + (__v16si) _mm512_setzero_si512 (), + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_broadcastq_epi64 (__m128i __A) +{ + return (__m512i) __builtin_ia32_pbroadcastq512 ((__v2di) __A, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_broadcastq_epi64 (__m512i __O, __mmask8 __M, __m128i __A) +{ + return (__m512i) __builtin_ia32_pbroadcastq512 ((__v2di) __A, + (__v8di) __O, __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_broadcastq_epi64 (__mmask8 __M, __m128i __A) +{ + return (__m512i) __builtin_ia32_pbroadcastq512 ((__v2di) __A, + (__v8di) + _mm512_setzero_si512 (), + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set1_epi64 (long long __A) +{ +#ifdef TARGET_64BIT + return (__m512i) __builtin_ia32_pbroadcastq512_gpr_mask (__A, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8)(-1)); +#else + return (__m512i) __builtin_ia32_pbroadcastq512_mem_mask (__A, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8)(-1)); +#endif +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_set1_epi64 (__m512i __O, __mmask8 __M, long long __A) +{ +#ifdef TARGET_64BIT + return (__m512i) __builtin_ia32_pbroadcastq512_gpr_mask (__A, (__v8di) __O, + __M); +#else + return (__m512i) __builtin_ia32_pbroadcastq512_mem_mask (__A, (__v8di) __O, + __M); +#endif +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_set1_epi64 (__mmask8 __M, long long __A) +{ +#ifdef TARGET_64BIT + return (__m512i) + __builtin_ia32_pbroadcastq512_gpr_mask (__A, + (__v8di) _mm512_setzero_si512 (), + __M); +#else + return (__m512i) + __builtin_ia32_pbroadcastq512_mem_mask (__A, + (__v8di) _mm512_setzero_si512 (), + __M); +#endif +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_broadcast_f32x4 (__m128 __A) +{ + return (__m512) __builtin_ia32_broadcastf32x4_512 ((__v4sf) __A, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_broadcast_f32x4 (__m512 __O, __mmask16 __M, __m128 __A) +{ + return (__m512) __builtin_ia32_broadcastf32x4_512 ((__v4sf) __A, + (__v16sf) __O, + __M); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_broadcast_f32x4 (__mmask16 __M, __m128 __A) +{ + return (__m512) __builtin_ia32_broadcastf32x4_512 ((__v4sf) __A, + (__v16sf) + _mm512_setzero_ps (), + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_broadcast_i32x4 (__m128i __A) +{ + return (__m512i) __builtin_ia32_broadcasti32x4_512 ((__v4si) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_broadcast_i32x4 (__m512i __O, __mmask16 __M, __m128i __A) +{ + return (__m512i) __builtin_ia32_broadcasti32x4_512 ((__v4si) __A, + (__v16si) __O, + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_broadcast_i32x4 (__mmask16 __M, __m128i __A) +{ + return (__m512i) __builtin_ia32_broadcasti32x4_512 ((__v4si) __A, + (__v16si) + _mm512_setzero_si512 (), + __M); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_broadcast_f64x4 (__m256d __A) +{ + return (__m512d) __builtin_ia32_broadcastf64x4_512 ((__v4df) __A, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_broadcast_f64x4 (__m512d __O, __mmask8 __M, __m256d __A) +{ + return (__m512d) __builtin_ia32_broadcastf64x4_512 ((__v4df) __A, + (__v8df) __O, + __M); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_broadcast_f64x4 (__mmask8 __M, __m256d __A) +{ + return (__m512d) __builtin_ia32_broadcastf64x4_512 ((__v4df) __A, + (__v8df) + _mm512_setzero_pd (), + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_broadcast_i64x4 (__m256i __A) +{ + return (__m512i) __builtin_ia32_broadcasti64x4_512 ((__v4di) __A, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_broadcast_i64x4 (__m512i __O, __mmask8 __M, __m256i __A) +{ + return (__m512i) __builtin_ia32_broadcasti64x4_512 ((__v4di) __A, + (__v8di) __O, + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_broadcast_i64x4 (__mmask8 __M, __m256i __A) +{ + return (__m512i) __builtin_ia32_broadcasti64x4_512 ((__v4di) __A, + (__v8di) + _mm512_setzero_si512 (), + __M); +} + +typedef enum +{ + _MM_PERM_AAAA = 0x00, _MM_PERM_AAAB = 0x01, _MM_PERM_AAAC = 0x02, + _MM_PERM_AAAD = 0x03, _MM_PERM_AABA = 0x04, _MM_PERM_AABB = 0x05, + _MM_PERM_AABC = 0x06, _MM_PERM_AABD = 0x07, _MM_PERM_AACA = 0x08, + _MM_PERM_AACB = 0x09, _MM_PERM_AACC = 0x0A, _MM_PERM_AACD = 0x0B, + _MM_PERM_AADA = 0x0C, _MM_PERM_AADB = 0x0D, _MM_PERM_AADC = 0x0E, + _MM_PERM_AADD = 0x0F, _MM_PERM_ABAA = 0x10, _MM_PERM_ABAB = 0x11, + _MM_PERM_ABAC = 0x12, _MM_PERM_ABAD = 0x13, _MM_PERM_ABBA = 0x14, + _MM_PERM_ABBB = 0x15, _MM_PERM_ABBC = 0x16, _MM_PERM_ABBD = 0x17, + _MM_PERM_ABCA = 0x18, _MM_PERM_ABCB = 0x19, _MM_PERM_ABCC = 0x1A, + _MM_PERM_ABCD = 0x1B, _MM_PERM_ABDA = 0x1C, _MM_PERM_ABDB = 0x1D, + _MM_PERM_ABDC = 0x1E, _MM_PERM_ABDD = 0x1F, _MM_PERM_ACAA = 0x20, + _MM_PERM_ACAB = 0x21, _MM_PERM_ACAC = 0x22, _MM_PERM_ACAD = 0x23, + _MM_PERM_ACBA = 0x24, _MM_PERM_ACBB = 0x25, _MM_PERM_ACBC = 0x26, + _MM_PERM_ACBD = 0x27, _MM_PERM_ACCA = 0x28, _MM_PERM_ACCB = 0x29, + _MM_PERM_ACCC = 0x2A, _MM_PERM_ACCD = 0x2B, _MM_PERM_ACDA = 0x2C, + _MM_PERM_ACDB = 0x2D, _MM_PERM_ACDC = 0x2E, _MM_PERM_ACDD = 0x2F, + _MM_PERM_ADAA = 0x30, _MM_PERM_ADAB = 0x31, _MM_PERM_ADAC = 0x32, + _MM_PERM_ADAD = 0x33, _MM_PERM_ADBA = 0x34, _MM_PERM_ADBB = 0x35, + _MM_PERM_ADBC = 0x36, _MM_PERM_ADBD = 0x37, _MM_PERM_ADCA = 0x38, + _MM_PERM_ADCB = 0x39, _MM_PERM_ADCC = 0x3A, _MM_PERM_ADCD = 0x3B, + _MM_PERM_ADDA = 0x3C, _MM_PERM_ADDB = 0x3D, _MM_PERM_ADDC = 0x3E, + _MM_PERM_ADDD = 0x3F, _MM_PERM_BAAA = 0x40, _MM_PERM_BAAB = 0x41, + _MM_PERM_BAAC = 0x42, _MM_PERM_BAAD = 0x43, _MM_PERM_BABA = 0x44, + _MM_PERM_BABB = 0x45, _MM_PERM_BABC = 0x46, _MM_PERM_BABD = 0x47, + _MM_PERM_BACA = 0x48, _MM_PERM_BACB = 0x49, _MM_PERM_BACC = 0x4A, + _MM_PERM_BACD = 0x4B, _MM_PERM_BADA = 0x4C, _MM_PERM_BADB = 0x4D, + _MM_PERM_BADC = 0x4E, _MM_PERM_BADD = 0x4F, _MM_PERM_BBAA = 0x50, + _MM_PERM_BBAB = 0x51, _MM_PERM_BBAC = 0x52, _MM_PERM_BBAD = 0x53, + _MM_PERM_BBBA = 0x54, _MM_PERM_BBBB = 0x55, _MM_PERM_BBBC = 0x56, + _MM_PERM_BBBD = 0x57, _MM_PERM_BBCA = 0x58, _MM_PERM_BBCB = 0x59, + _MM_PERM_BBCC = 0x5A, _MM_PERM_BBCD = 0x5B, _MM_PERM_BBDA = 0x5C, + _MM_PERM_BBDB = 0x5D, _MM_PERM_BBDC = 0x5E, _MM_PERM_BBDD = 0x5F, + _MM_PERM_BCAA = 0x60, _MM_PERM_BCAB = 0x61, _MM_PERM_BCAC = 0x62, + _MM_PERM_BCAD = 0x63, _MM_PERM_BCBA = 0x64, _MM_PERM_BCBB = 0x65, + _MM_PERM_BCBC = 0x66, _MM_PERM_BCBD = 0x67, _MM_PERM_BCCA = 0x68, + _MM_PERM_BCCB = 0x69, _MM_PERM_BCCC = 0x6A, _MM_PERM_BCCD = 0x6B, + _MM_PERM_BCDA = 0x6C, _MM_PERM_BCDB = 0x6D, _MM_PERM_BCDC = 0x6E, + _MM_PERM_BCDD = 0x6F, _MM_PERM_BDAA = 0x70, _MM_PERM_BDAB = 0x71, + _MM_PERM_BDAC = 0x72, _MM_PERM_BDAD = 0x73, _MM_PERM_BDBA = 0x74, + _MM_PERM_BDBB = 0x75, _MM_PERM_BDBC = 0x76, _MM_PERM_BDBD = 0x77, + _MM_PERM_BDCA = 0x78, _MM_PERM_BDCB = 0x79, _MM_PERM_BDCC = 0x7A, + _MM_PERM_BDCD = 0x7B, _MM_PERM_BDDA = 0x7C, _MM_PERM_BDDB = 0x7D, + _MM_PERM_BDDC = 0x7E, _MM_PERM_BDDD = 0x7F, _MM_PERM_CAAA = 0x80, + _MM_PERM_CAAB = 0x81, _MM_PERM_CAAC = 0x82, _MM_PERM_CAAD = 0x83, + _MM_PERM_CABA = 0x84, _MM_PERM_CABB = 0x85, _MM_PERM_CABC = 0x86, + _MM_PERM_CABD = 0x87, _MM_PERM_CACA = 0x88, _MM_PERM_CACB = 0x89, + _MM_PERM_CACC = 0x8A, _MM_PERM_CACD = 0x8B, _MM_PERM_CADA = 0x8C, + _MM_PERM_CADB = 0x8D, _MM_PERM_CADC = 0x8E, _MM_PERM_CADD = 0x8F, + _MM_PERM_CBAA = 0x90, _MM_PERM_CBAB = 0x91, _MM_PERM_CBAC = 0x92, + _MM_PERM_CBAD = 0x93, _MM_PERM_CBBA = 0x94, _MM_PERM_CBBB = 0x95, + _MM_PERM_CBBC = 0x96, _MM_PERM_CBBD = 0x97, _MM_PERM_CBCA = 0x98, + _MM_PERM_CBCB = 0x99, _MM_PERM_CBCC = 0x9A, _MM_PERM_CBCD = 0x9B, + _MM_PERM_CBDA = 0x9C, _MM_PERM_CBDB = 0x9D, _MM_PERM_CBDC = 0x9E, + _MM_PERM_CBDD = 0x9F, _MM_PERM_CCAA = 0xA0, _MM_PERM_CCAB = 0xA1, + _MM_PERM_CCAC = 0xA2, _MM_PERM_CCAD = 0xA3, _MM_PERM_CCBA = 0xA4, + _MM_PERM_CCBB = 0xA5, _MM_PERM_CCBC = 0xA6, _MM_PERM_CCBD = 0xA7, + _MM_PERM_CCCA = 0xA8, _MM_PERM_CCCB = 0xA9, _MM_PERM_CCCC = 0xAA, + _MM_PERM_CCCD = 0xAB, _MM_PERM_CCDA = 0xAC, _MM_PERM_CCDB = 0xAD, + _MM_PERM_CCDC = 0xAE, _MM_PERM_CCDD = 0xAF, _MM_PERM_CDAA = 0xB0, + _MM_PERM_CDAB = 0xB1, _MM_PERM_CDAC = 0xB2, _MM_PERM_CDAD = 0xB3, + _MM_PERM_CDBA = 0xB4, _MM_PERM_CDBB = 0xB5, _MM_PERM_CDBC = 0xB6, + _MM_PERM_CDBD = 0xB7, _MM_PERM_CDCA = 0xB8, _MM_PERM_CDCB = 0xB9, + _MM_PERM_CDCC = 0xBA, _MM_PERM_CDCD = 0xBB, _MM_PERM_CDDA = 0xBC, + _MM_PERM_CDDB = 0xBD, _MM_PERM_CDDC = 0xBE, _MM_PERM_CDDD = 0xBF, + _MM_PERM_DAAA = 0xC0, _MM_PERM_DAAB = 0xC1, _MM_PERM_DAAC = 0xC2, + _MM_PERM_DAAD = 0xC3, _MM_PERM_DABA = 0xC4, _MM_PERM_DABB = 0xC5, + _MM_PERM_DABC = 0xC6, _MM_PERM_DABD = 0xC7, _MM_PERM_DACA = 0xC8, + _MM_PERM_DACB = 0xC9, _MM_PERM_DACC = 0xCA, _MM_PERM_DACD = 0xCB, + _MM_PERM_DADA = 0xCC, _MM_PERM_DADB = 0xCD, _MM_PERM_DADC = 0xCE, + _MM_PERM_DADD = 0xCF, _MM_PERM_DBAA = 0xD0, _MM_PERM_DBAB = 0xD1, + _MM_PERM_DBAC = 0xD2, _MM_PERM_DBAD = 0xD3, _MM_PERM_DBBA = 0xD4, + _MM_PERM_DBBB = 0xD5, _MM_PERM_DBBC = 0xD6, _MM_PERM_DBBD = 0xD7, + _MM_PERM_DBCA = 0xD8, _MM_PERM_DBCB = 0xD9, _MM_PERM_DBCC = 0xDA, + _MM_PERM_DBCD = 0xDB, _MM_PERM_DBDA = 0xDC, _MM_PERM_DBDB = 0xDD, + _MM_PERM_DBDC = 0xDE, _MM_PERM_DBDD = 0xDF, _MM_PERM_DCAA = 0xE0, + _MM_PERM_DCAB = 0xE1, _MM_PERM_DCAC = 0xE2, _MM_PERM_DCAD = 0xE3, + _MM_PERM_DCBA = 0xE4, _MM_PERM_DCBB = 0xE5, _MM_PERM_DCBC = 0xE6, + _MM_PERM_DCBD = 0xE7, _MM_PERM_DCCA = 0xE8, _MM_PERM_DCCB = 0xE9, + _MM_PERM_DCCC = 0xEA, _MM_PERM_DCCD = 0xEB, _MM_PERM_DCDA = 0xEC, + _MM_PERM_DCDB = 0xED, _MM_PERM_DCDC = 0xEE, _MM_PERM_DCDD = 0xEF, + _MM_PERM_DDAA = 0xF0, _MM_PERM_DDAB = 0xF1, _MM_PERM_DDAC = 0xF2, + _MM_PERM_DDAD = 0xF3, _MM_PERM_DDBA = 0xF4, _MM_PERM_DDBB = 0xF5, + _MM_PERM_DDBC = 0xF6, _MM_PERM_DDBD = 0xF7, _MM_PERM_DDCA = 0xF8, + _MM_PERM_DDCB = 0xF9, _MM_PERM_DDCC = 0xFA, _MM_PERM_DDCD = 0xFB, + _MM_PERM_DDDA = 0xFC, _MM_PERM_DDDB = 0xFD, _MM_PERM_DDDC = 0xFE, + _MM_PERM_DDDD = 0xFF +} _MM_PERM_ENUM; + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_shuffle_epi32 (__m512i __A, _MM_PERM_ENUM __mask) +{ + return (__m512i) __builtin_ia32_pshufd512_mask ((__v16si) __A, + __mask, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_shuffle_epi32 (__m512i __W, __mmask16 __U, __m512i __A, + _MM_PERM_ENUM __mask) +{ + return (__m512i) __builtin_ia32_pshufd512_mask ((__v16si) __A, + __mask, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_shuffle_epi32 (__mmask16 __U, __m512i __A, _MM_PERM_ENUM __mask) +{ + return (__m512i) __builtin_ia32_pshufd512_mask ((__v16si) __A, + __mask, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_shuffle_i64x2 (__m512i __A, __m512i __B, const int __imm) +{ + return (__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di) __A, + (__v8di) __B, __imm, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_shuffle_i64x2 (__m512i __W, __mmask8 __U, __m512i __A, + __m512i __B, const int __imm) +{ + return (__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di) __A, + (__v8di) __B, __imm, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_shuffle_i64x2 (__mmask8 __U, __m512i __A, __m512i __B, + const int __imm) +{ + return (__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di) __A, + (__v8di) __B, __imm, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_shuffle_i32x4 (__m512i __A, __m512i __B, const int __imm) +{ + return (__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si) __A, + (__v16si) __B, + __imm, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_shuffle_i32x4 (__m512i __W, __mmask16 __U, __m512i __A, + __m512i __B, const int __imm) +{ + return (__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si) __A, + (__v16si) __B, + __imm, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_shuffle_i32x4 (__mmask16 __U, __m512i __A, __m512i __B, + const int __imm) +{ + return (__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si) __A, + (__v16si) __B, + __imm, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_shuffle_f64x2 (__m512d __A, __m512d __B, const int __imm) +{ + return (__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df) __A, + (__v8df) __B, __imm, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_shuffle_f64x2 (__m512d __W, __mmask8 __U, __m512d __A, + __m512d __B, const int __imm) +{ + return (__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df) __A, + (__v8df) __B, __imm, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_shuffle_f64x2 (__mmask8 __U, __m512d __A, __m512d __B, + const int __imm) +{ + return (__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df) __A, + (__v8df) __B, __imm, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_shuffle_f32x4 (__m512 __A, __m512 __B, const int __imm) +{ + return (__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf) __A, + (__v16sf) __B, __imm, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_shuffle_f32x4 (__m512 __W, __mmask16 __U, __m512 __A, + __m512 __B, const int __imm) +{ + return (__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf) __A, + (__v16sf) __B, __imm, + (__v16sf) __W, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_shuffle_f32x4 (__mmask16 __U, __m512 __A, __m512 __B, + const int __imm) +{ + return (__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf) __A, + (__v16sf) __B, __imm, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); +} + +#else +#define _mm512_shuffle_epi32(X, C) \ + ((__m512i) __builtin_ia32_pshufd512_mask ((__v16si)(__m512i)(X), (int)(C),\ + (__v16si)(__m512i)_mm512_undefined_si512 (),\ + (__mmask16)-1)) + +#define _mm512_mask_shuffle_epi32(W, U, X, C) \ + ((__m512i) __builtin_ia32_pshufd512_mask ((__v16si)(__m512i)(X), (int)(C),\ + (__v16si)(__m512i)(W),\ + (__mmask16)(U))) + +#define _mm512_maskz_shuffle_epi32(U, X, C) \ + ((__m512i) __builtin_ia32_pshufd512_mask ((__v16si)(__m512i)(X), (int)(C),\ + (__v16si)(__m512i)_mm512_setzero_si512 (),\ + (__mmask16)(U))) + +#define _mm512_shuffle_i64x2(X, Y, C) \ + ((__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di)(__m512i)(X), \ + (__v8di)(__m512i)(Y), (int)(C),\ + (__v8di)(__m512i)_mm512_undefined_si512 (),\ + (__mmask8)-1)) + +#define _mm512_mask_shuffle_i64x2(W, U, X, Y, C) \ + ((__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di)(__m512i)(X), \ + (__v8di)(__m512i)(Y), (int)(C),\ + (__v8di)(__m512i)(W),\ + (__mmask8)(U))) + +#define _mm512_maskz_shuffle_i64x2(U, X, Y, C) \ + ((__m512i) __builtin_ia32_shuf_i64x2_mask ((__v8di)(__m512i)(X), \ + (__v8di)(__m512i)(Y), (int)(C),\ + (__v8di)(__m512i)_mm512_setzero_si512 (),\ + (__mmask8)(U))) + +#define _mm512_shuffle_i32x4(X, Y, C) \ + ((__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si)(__m512i)(X), \ + (__v16si)(__m512i)(Y), (int)(C),\ + (__v16si)(__m512i)_mm512_undefined_si512 (),\ + (__mmask16)-1)) + +#define _mm512_mask_shuffle_i32x4(W, U, X, Y, C) \ + ((__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si)(__m512i)(X), \ + (__v16si)(__m512i)(Y), (int)(C),\ + (__v16si)(__m512i)(W),\ + (__mmask16)(U))) + +#define _mm512_maskz_shuffle_i32x4(U, X, Y, C) \ + ((__m512i) __builtin_ia32_shuf_i32x4_mask ((__v16si)(__m512i)(X), \ + (__v16si)(__m512i)(Y), (int)(C),\ + (__v16si)(__m512i)_mm512_setzero_si512 (),\ + (__mmask16)(U))) + +#define _mm512_shuffle_f64x2(X, Y, C) \ + ((__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df)(__m512d)(X), \ + (__v8df)(__m512d)(Y), (int)(C),\ + (__v8df)(__m512d)_mm512_undefined_pd(),\ + (__mmask8)-1)) + +#define _mm512_mask_shuffle_f64x2(W, U, X, Y, C) \ + ((__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df)(__m512d)(X), \ + (__v8df)(__m512d)(Y), (int)(C),\ + (__v8df)(__m512d)(W),\ + (__mmask8)(U))) + +#define _mm512_maskz_shuffle_f64x2(U, X, Y, C) \ + ((__m512d) __builtin_ia32_shuf_f64x2_mask ((__v8df)(__m512d)(X), \ + (__v8df)(__m512d)(Y), (int)(C),\ + (__v8df)(__m512d)_mm512_setzero_pd(),\ + (__mmask8)(U))) + +#define _mm512_shuffle_f32x4(X, Y, C) \ + ((__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf)(__m512)(X), \ + (__v16sf)(__m512)(Y), (int)(C),\ + (__v16sf)(__m512)_mm512_undefined_ps(),\ + (__mmask16)-1)) + +#define _mm512_mask_shuffle_f32x4(W, U, X, Y, C) \ + ((__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf)(__m512)(X), \ + (__v16sf)(__m512)(Y), (int)(C),\ + (__v16sf)(__m512)(W),\ + (__mmask16)(U))) + +#define _mm512_maskz_shuffle_f32x4(U, X, Y, C) \ + ((__m512) __builtin_ia32_shuf_f32x4_mask ((__v16sf)(__m512)(X), \ + (__v16sf)(__m512)(Y), (int)(C),\ + (__v16sf)(__m512)_mm512_setzero_ps(),\ + (__mmask16)(U))) +#endif + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_rolv_epi32 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_prolvd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_rolv_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_prolvd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_rolv_epi32 (__mmask16 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_prolvd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_rorv_epi32 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_prorvd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_rorv_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_prorvd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_rorv_epi32 (__mmask16 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_prorvd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_rolv_epi64 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_prolvq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_rolv_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_prolvq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_rolv_epi64 (__mmask8 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_prolvq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_rorv_epi64 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_prorvq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_rorv_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_prorvq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_rorv_epi64 (__mmask8 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_prorvq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +#ifdef __OPTIMIZE__ +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtt_roundpd_epi32 (__m512d __A, const int __R) +{ + return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A, + (__v8si) + _mm256_undefined_si256 (), + (__mmask8) -1, __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtt_roundpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A, + const int __R) +{ + return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A, + (__v8si) __W, + (__mmask8) __U, __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtt_roundpd_epi32 (__mmask8 __U, __m512d __A, const int __R) +{ + return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A, + (__v8si) + _mm256_setzero_si256 (), + (__mmask8) __U, __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtt_roundpd_epu32 (__m512d __A, const int __R) +{ + return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A, + (__v8si) + _mm256_undefined_si256 (), + (__mmask8) -1, __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtt_roundpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A, + const int __R) +{ + return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A, + (__v8si) __W, + (__mmask8) __U, __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtt_roundpd_epu32 (__mmask8 __U, __m512d __A, const int __R) +{ + return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A, + (__v8si) + _mm256_setzero_si256 (), + (__mmask8) __U, __R); +} +#else +#define _mm512_cvtt_roundpd_epi32(A, B) \ + ((__m256i)__builtin_ia32_cvttpd2dq512_mask(A, (__v8si)_mm256_undefined_si256(), -1, B)) + +#define _mm512_mask_cvtt_roundpd_epi32(W, U, A, B) \ + ((__m256i)__builtin_ia32_cvttpd2dq512_mask(A, (__v8si)(W), U, B)) + +#define _mm512_maskz_cvtt_roundpd_epi32(U, A, B) \ + ((__m256i)__builtin_ia32_cvttpd2dq512_mask(A, (__v8si)_mm256_setzero_si256(), U, B)) + +#define _mm512_cvtt_roundpd_epu32(A, B) \ + ((__m256i)__builtin_ia32_cvttpd2udq512_mask(A, (__v8si)_mm256_undefined_si256(), -1, B)) + +#define _mm512_mask_cvtt_roundpd_epu32(W, U, A, B) \ + ((__m256i)__builtin_ia32_cvttpd2udq512_mask(A, (__v8si)(W), U, B)) + +#define _mm512_maskz_cvtt_roundpd_epu32(U, A, B) \ + ((__m256i)__builtin_ia32_cvttpd2udq512_mask(A, (__v8si)_mm256_setzero_si256(), U, B)) +#endif + +#ifdef __OPTIMIZE__ +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundpd_epi32 (__m512d __A, const int __R) +{ + return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A, + (__v8si) + _mm256_undefined_si256 (), + (__mmask8) -1, __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A, + const int __R) +{ + return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A, + (__v8si) __W, + (__mmask8) __U, __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundpd_epi32 (__mmask8 __U, __m512d __A, const int __R) +{ + return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A, + (__v8si) + _mm256_setzero_si256 (), + (__mmask8) __U, __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundpd_epu32 (__m512d __A, const int __R) +{ + return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A, + (__v8si) + _mm256_undefined_si256 (), + (__mmask8) -1, __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A, + const int __R) +{ + return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A, + (__v8si) __W, + (__mmask8) __U, __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundpd_epu32 (__mmask8 __U, __m512d __A, const int __R) +{ + return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A, + (__v8si) + _mm256_setzero_si256 (), + (__mmask8) __U, __R); +} +#else +#define _mm512_cvt_roundpd_epi32(A, B) \ + ((__m256i)__builtin_ia32_cvtpd2dq512_mask(A, (__v8si)_mm256_undefined_si256(), -1, B)) + +#define _mm512_mask_cvt_roundpd_epi32(W, U, A, B) \ + ((__m256i)__builtin_ia32_cvtpd2dq512_mask(A, (__v8si)(W), U, B)) + +#define _mm512_maskz_cvt_roundpd_epi32(U, A, B) \ + ((__m256i)__builtin_ia32_cvtpd2dq512_mask(A, (__v8si)_mm256_setzero_si256(), U, B)) + +#define _mm512_cvt_roundpd_epu32(A, B) \ + ((__m256i)__builtin_ia32_cvtpd2udq512_mask(A, (__v8si)_mm256_undefined_si256(), -1, B)) + +#define _mm512_mask_cvt_roundpd_epu32(W, U, A, B) \ + ((__m256i)__builtin_ia32_cvtpd2udq512_mask(A, (__v8si)(W), U, B)) + +#define _mm512_maskz_cvt_roundpd_epu32(U, A, B) \ + ((__m256i)__builtin_ia32_cvtpd2udq512_mask(A, (__v8si)_mm256_setzero_si256(), U, B)) +#endif + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtt_roundps_epi32 (__m512 __A, const int __R) +{ + return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1, __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtt_roundps_epi32 (__m512i __W, __mmask16 __U, __m512 __A, + const int __R) +{ + return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A, + (__v16si) __W, + (__mmask16) __U, __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtt_roundps_epi32 (__mmask16 __U, __m512 __A, const int __R) +{ + return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U, __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtt_roundps_epu32 (__m512 __A, const int __R) +{ + return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1, __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtt_roundps_epu32 (__m512i __W, __mmask16 __U, __m512 __A, + const int __R) +{ + return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A, + (__v16si) __W, + (__mmask16) __U, __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtt_roundps_epu32 (__mmask16 __U, __m512 __A, const int __R) +{ + return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U, __R); +} +#else +#define _mm512_cvtt_roundps_epi32(A, B) \ + ((__m512i)__builtin_ia32_cvttps2dq512_mask(A, (__v16si)_mm512_undefined_si512 (), -1, B)) + +#define _mm512_mask_cvtt_roundps_epi32(W, U, A, B) \ + ((__m512i)__builtin_ia32_cvttps2dq512_mask(A, (__v16si)(W), U, B)) + +#define _mm512_maskz_cvtt_roundps_epi32(U, A, B) \ + ((__m512i)__builtin_ia32_cvttps2dq512_mask(A, (__v16si)_mm512_setzero_si512 (), U, B)) + +#define _mm512_cvtt_roundps_epu32(A, B) \ + ((__m512i)__builtin_ia32_cvttps2udq512_mask(A, (__v16si)_mm512_undefined_si512 (), -1, B)) + +#define _mm512_mask_cvtt_roundps_epu32(W, U, A, B) \ + ((__m512i)__builtin_ia32_cvttps2udq512_mask(A, (__v16si)(W), U, B)) + +#define _mm512_maskz_cvtt_roundps_epu32(U, A, B) \ + ((__m512i)__builtin_ia32_cvttps2udq512_mask(A, (__v16si)_mm512_setzero_si512 (), U, B)) +#endif + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundps_epi32 (__m512 __A, const int __R) +{ + return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1, __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundps_epi32 (__m512i __W, __mmask16 __U, __m512 __A, + const int __R) +{ + return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A, + (__v16si) __W, + (__mmask16) __U, __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundps_epi32 (__mmask16 __U, __m512 __A, const int __R) +{ + return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U, __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundps_epu32 (__m512 __A, const int __R) +{ + return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1, __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundps_epu32 (__m512i __W, __mmask16 __U, __m512 __A, + const int __R) +{ + return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A, + (__v16si) __W, + (__mmask16) __U, __R); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundps_epu32 (__mmask16 __U, __m512 __A, const int __R) +{ + return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U, __R); +} +#else +#define _mm512_cvt_roundps_epi32(A, B) \ + ((__m512i)__builtin_ia32_cvtps2dq512_mask(A, (__v16si)_mm512_undefined_si512 (), -1, B)) + +#define _mm512_mask_cvt_roundps_epi32(W, U, A, B) \ + ((__m512i)__builtin_ia32_cvtps2dq512_mask(A, (__v16si)(W), U, B)) + +#define _mm512_maskz_cvt_roundps_epi32(U, A, B) \ + ((__m512i)__builtin_ia32_cvtps2dq512_mask(A, (__v16si)_mm512_setzero_si512 (), U, B)) + +#define _mm512_cvt_roundps_epu32(A, B) \ + ((__m512i)__builtin_ia32_cvtps2udq512_mask(A, (__v16si)_mm512_undefined_si512 (), -1, B)) + +#define _mm512_mask_cvt_roundps_epu32(W, U, A, B) \ + ((__m512i)__builtin_ia32_cvtps2udq512_mask(A, (__v16si)(W), U, B)) + +#define _mm512_maskz_cvt_roundps_epu32(U, A, B) \ + ((__m512i)__builtin_ia32_cvtps2udq512_mask(A, (__v16si)_mm512_setzero_si512 (), U, B)) +#endif + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtu32_sd (__m128d __A, unsigned __B) +{ + return (__m128d) __builtin_ia32_cvtusi2sd32 ((__v2df) __A, __B); +} + +#ifdef __x86_64__ +#ifdef __OPTIMIZE__ +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundu64_sd (__m128d __A, unsigned long long __B, const int __R) +{ + return (__m128d) __builtin_ia32_cvtusi2sd64 ((__v2df) __A, __B, __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundi64_sd (__m128d __A, long long __B, const int __R) +{ + return (__m128d) __builtin_ia32_cvtsi2sd64 ((__v2df) __A, __B, __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundsi64_sd (__m128d __A, long long __B, const int __R) +{ + return (__m128d) __builtin_ia32_cvtsi2sd64 ((__v2df) __A, __B, __R); +} +#else +#define _mm_cvt_roundu64_sd(A, B, C) \ + (__m128d)__builtin_ia32_cvtusi2sd64(A, B, C) + +#define _mm_cvt_roundi64_sd(A, B, C) \ + (__m128d)__builtin_ia32_cvtsi2sd64(A, B, C) + +#define _mm_cvt_roundsi64_sd(A, B, C) \ + (__m128d)__builtin_ia32_cvtsi2sd64(A, B, C) +#endif + +#endif + +#ifdef __OPTIMIZE__ +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundu32_ss (__m128 __A, unsigned __B, const int __R) +{ + return (__m128) __builtin_ia32_cvtusi2ss32 ((__v4sf) __A, __B, __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundsi32_ss (__m128 __A, int __B, const int __R) +{ + return (__m128) __builtin_ia32_cvtsi2ss32 ((__v4sf) __A, __B, __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundi32_ss (__m128 __A, int __B, const int __R) +{ + return (__m128) __builtin_ia32_cvtsi2ss32 ((__v4sf) __A, __B, __R); +} +#else +#define _mm_cvt_roundu32_ss(A, B, C) \ + (__m128)__builtin_ia32_cvtusi2ss32(A, B, C) + +#define _mm_cvt_roundi32_ss(A, B, C) \ + (__m128)__builtin_ia32_cvtsi2ss32(A, B, C) + +#define _mm_cvt_roundsi32_ss(A, B, C) \ + (__m128)__builtin_ia32_cvtsi2ss32(A, B, C) +#endif + +#ifdef __x86_64__ +#ifdef __OPTIMIZE__ +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundu64_ss (__m128 __A, unsigned long long __B, const int __R) +{ + return (__m128) __builtin_ia32_cvtusi2ss64 ((__v4sf) __A, __B, __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundsi64_ss (__m128 __A, long long __B, const int __R) +{ + return (__m128) __builtin_ia32_cvtsi2ss64 ((__v4sf) __A, __B, __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundi64_ss (__m128 __A, long long __B, const int __R) +{ + return (__m128) __builtin_ia32_cvtsi2ss64 ((__v4sf) __A, __B, __R); +} +#else +#define _mm_cvt_roundu64_ss(A, B, C) \ + (__m128)__builtin_ia32_cvtusi2ss64(A, B, C) + +#define _mm_cvt_roundi64_ss(A, B, C) \ + (__m128)__builtin_ia32_cvtsi2ss64(A, B, C) + +#define _mm_cvt_roundsi64_ss(A, B, C) \ + (__m128)__builtin_ia32_cvtsi2ss64(A, B, C) +#endif + +#endif + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepi32_epi8 (__m512i __A) +{ + return (__m128i) __builtin_ia32_pmovdb512_mask ((__v16si) __A, + (__v16qi) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepi32_storeu_epi8 (void * __P, __mmask16 __M, __m512i __A) +{ + __builtin_ia32_pmovdb512mem_mask ((__v16qi *) __P, (__v16si) __A, __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepi32_epi8 (__m128i __O, __mmask16 __M, __m512i __A) +{ + return (__m128i) __builtin_ia32_pmovdb512_mask ((__v16si) __A, + (__v16qi) __O, __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepi32_epi8 (__mmask16 __M, __m512i __A) +{ + return (__m128i) __builtin_ia32_pmovdb512_mask ((__v16si) __A, + (__v16qi) + _mm_setzero_si128 (), + __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtsepi32_epi8 (__m512i __A) +{ + return (__m128i) __builtin_ia32_pmovsdb512_mask ((__v16si) __A, + (__v16qi) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtsepi32_storeu_epi8 (void * __P, __mmask16 __M, __m512i __A) +{ + __builtin_ia32_pmovsdb512mem_mask ((__v16qi *) __P, (__v16si) __A, __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtsepi32_epi8 (__m128i __O, __mmask16 __M, __m512i __A) +{ + return (__m128i) __builtin_ia32_pmovsdb512_mask ((__v16si) __A, + (__v16qi) __O, __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtsepi32_epi8 (__mmask16 __M, __m512i __A) +{ + return (__m128i) __builtin_ia32_pmovsdb512_mask ((__v16si) __A, + (__v16qi) + _mm_setzero_si128 (), + __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtusepi32_epi8 (__m512i __A) +{ + return (__m128i) __builtin_ia32_pmovusdb512_mask ((__v16si) __A, + (__v16qi) + _mm_undefined_si128 (), + (__mmask16) -1); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtusepi32_storeu_epi8 (void * __P, __mmask16 __M, __m512i __A) +{ + __builtin_ia32_pmovusdb512mem_mask ((__v16qi *) __P, (__v16si) __A, __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtusepi32_epi8 (__m128i __O, __mmask16 __M, __m512i __A) +{ + return (__m128i) __builtin_ia32_pmovusdb512_mask ((__v16si) __A, + (__v16qi) __O, + __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtusepi32_epi8 (__mmask16 __M, __m512i __A) +{ + return (__m128i) __builtin_ia32_pmovusdb512_mask ((__v16si) __A, + (__v16qi) + _mm_setzero_si128 (), + __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepi32_epi16 (__m512i __A) +{ + return (__m256i) __builtin_ia32_pmovdw512_mask ((__v16si) __A, + (__v16hi) + _mm256_undefined_si256 (), + (__mmask16) -1); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepi32_storeu_epi16 (void * __P, __mmask16 __M, __m512i __A) +{ + __builtin_ia32_pmovdw512mem_mask ((__v16hi *) __P, (__v16si) __A, __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepi32_epi16 (__m256i __O, __mmask16 __M, __m512i __A) +{ + return (__m256i) __builtin_ia32_pmovdw512_mask ((__v16si) __A, + (__v16hi) __O, __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepi32_epi16 (__mmask16 __M, __m512i __A) +{ + return (__m256i) __builtin_ia32_pmovdw512_mask ((__v16si) __A, + (__v16hi) + _mm256_setzero_si256 (), + __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtsepi32_epi16 (__m512i __A) +{ + return (__m256i) __builtin_ia32_pmovsdw512_mask ((__v16si) __A, + (__v16hi) + _mm256_undefined_si256 (), + (__mmask16) -1); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtsepi32_storeu_epi16 (void *__P, __mmask16 __M, __m512i __A) +{ + __builtin_ia32_pmovsdw512mem_mask ((__v16hi*) __P, (__v16si) __A, __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtsepi32_epi16 (__m256i __O, __mmask16 __M, __m512i __A) +{ + return (__m256i) __builtin_ia32_pmovsdw512_mask ((__v16si) __A, + (__v16hi) __O, __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtsepi32_epi16 (__mmask16 __M, __m512i __A) +{ + return (__m256i) __builtin_ia32_pmovsdw512_mask ((__v16si) __A, + (__v16hi) + _mm256_setzero_si256 (), + __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtusepi32_epi16 (__m512i __A) +{ + return (__m256i) __builtin_ia32_pmovusdw512_mask ((__v16si) __A, + (__v16hi) + _mm256_undefined_si256 (), + (__mmask16) -1); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtusepi32_storeu_epi16 (void *__P, __mmask16 __M, __m512i __A) +{ + __builtin_ia32_pmovusdw512mem_mask ((__v16hi*) __P, (__v16si) __A, __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtusepi32_epi16 (__m256i __O, __mmask16 __M, __m512i __A) +{ + return (__m256i) __builtin_ia32_pmovusdw512_mask ((__v16si) __A, + (__v16hi) __O, + __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtusepi32_epi16 (__mmask16 __M, __m512i __A) +{ + return (__m256i) __builtin_ia32_pmovusdw512_mask ((__v16si) __A, + (__v16hi) + _mm256_setzero_si256 (), + __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepi64_epi32 (__m512i __A) +{ + return (__m256i) __builtin_ia32_pmovqd512_mask ((__v8di) __A, + (__v8si) + _mm256_undefined_si256 (), + (__mmask8) -1); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepi64_storeu_epi32 (void* __P, __mmask8 __M, __m512i __A) +{ + __builtin_ia32_pmovqd512mem_mask ((__v8si *) __P, (__v8di) __A, __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepi64_epi32 (__m256i __O, __mmask8 __M, __m512i __A) +{ + return (__m256i) __builtin_ia32_pmovqd512_mask ((__v8di) __A, + (__v8si) __O, __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepi64_epi32 (__mmask8 __M, __m512i __A) +{ + return (__m256i) __builtin_ia32_pmovqd512_mask ((__v8di) __A, + (__v8si) + _mm256_setzero_si256 (), + __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtsepi64_epi32 (__m512i __A) +{ + __v8si __O; + return (__m256i) __builtin_ia32_pmovsqd512_mask ((__v8di) __A, + (__v8si) + _mm256_undefined_si256 (), + (__mmask8) -1); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtsepi64_storeu_epi32 (void *__P, __mmask8 __M, __m512i __A) +{ + __builtin_ia32_pmovsqd512mem_mask ((__v8si *) __P, (__v8di) __A, __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtsepi64_epi32 (__m256i __O, __mmask8 __M, __m512i __A) +{ + return (__m256i) __builtin_ia32_pmovsqd512_mask ((__v8di) __A, + (__v8si) __O, __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtsepi64_epi32 (__mmask8 __M, __m512i __A) +{ + return (__m256i) __builtin_ia32_pmovsqd512_mask ((__v8di) __A, + (__v8si) + _mm256_setzero_si256 (), + __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtusepi64_epi32 (__m512i __A) +{ + return (__m256i) __builtin_ia32_pmovusqd512_mask ((__v8di) __A, + (__v8si) + _mm256_undefined_si256 (), + (__mmask8) -1); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtusepi64_storeu_epi32 (void* __P, __mmask8 __M, __m512i __A) +{ + __builtin_ia32_pmovusqd512mem_mask ((__v8si*) __P, (__v8di) __A, __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtusepi64_epi32 (__m256i __O, __mmask8 __M, __m512i __A) +{ + return (__m256i) __builtin_ia32_pmovusqd512_mask ((__v8di) __A, + (__v8si) __O, __M); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtusepi64_epi32 (__mmask8 __M, __m512i __A) +{ + return (__m256i) __builtin_ia32_pmovusqd512_mask ((__v8di) __A, + (__v8si) + _mm256_setzero_si256 (), + __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepi64_epi16 (__m512i __A) +{ + return (__m128i) __builtin_ia32_pmovqw512_mask ((__v8di) __A, + (__v8hi) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepi64_storeu_epi16 (void *__P, __mmask8 __M, __m512i __A) +{ + __builtin_ia32_pmovqw512mem_mask ((__v8hi *) __P, (__v8di) __A, __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepi64_epi16 (__m128i __O, __mmask8 __M, __m512i __A) +{ + return (__m128i) __builtin_ia32_pmovqw512_mask ((__v8di) __A, + (__v8hi) __O, __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepi64_epi16 (__mmask8 __M, __m512i __A) +{ + return (__m128i) __builtin_ia32_pmovqw512_mask ((__v8di) __A, + (__v8hi) + _mm_setzero_si128 (), + __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtsepi64_epi16 (__m512i __A) +{ + return (__m128i) __builtin_ia32_pmovsqw512_mask ((__v8di) __A, + (__v8hi) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtsepi64_storeu_epi16 (void * __P, __mmask8 __M, __m512i __A) +{ + __builtin_ia32_pmovsqw512mem_mask ((__v8hi *) __P, (__v8di) __A, __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtsepi64_epi16 (__m128i __O, __mmask8 __M, __m512i __A) +{ + return (__m128i) __builtin_ia32_pmovsqw512_mask ((__v8di) __A, + (__v8hi) __O, __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtsepi64_epi16 (__mmask8 __M, __m512i __A) +{ + return (__m128i) __builtin_ia32_pmovsqw512_mask ((__v8di) __A, + (__v8hi) + _mm_setzero_si128 (), + __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtusepi64_epi16 (__m512i __A) +{ + return (__m128i) __builtin_ia32_pmovusqw512_mask ((__v8di) __A, + (__v8hi) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtusepi64_storeu_epi16 (void *__P, __mmask8 __M, __m512i __A) +{ + __builtin_ia32_pmovusqw512mem_mask ((__v8hi*) __P, (__v8di) __A, __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtusepi64_epi16 (__m128i __O, __mmask8 __M, __m512i __A) +{ + return (__m128i) __builtin_ia32_pmovusqw512_mask ((__v8di) __A, + (__v8hi) __O, __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtusepi64_epi16 (__mmask8 __M, __m512i __A) +{ + return (__m128i) __builtin_ia32_pmovusqw512_mask ((__v8di) __A, + (__v8hi) + _mm_setzero_si128 (), + __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepi64_epi8 (__m512i __A) +{ + return (__m128i) __builtin_ia32_pmovqb512_mask ((__v8di) __A, + (__v16qi) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepi64_storeu_epi8 (void * __P, __mmask8 __M, __m512i __A) +{ + __builtin_ia32_pmovqb512mem_mask ((__v16qi *) __P, (__v8di) __A, __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepi64_epi8 (__m128i __O, __mmask8 __M, __m512i __A) +{ + return (__m128i) __builtin_ia32_pmovqb512_mask ((__v8di) __A, + (__v16qi) __O, __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepi64_epi8 (__mmask8 __M, __m512i __A) +{ + return (__m128i) __builtin_ia32_pmovqb512_mask ((__v8di) __A, + (__v16qi) + _mm_setzero_si128 (), + __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtsepi64_epi8 (__m512i __A) +{ + return (__m128i) __builtin_ia32_pmovsqb512_mask ((__v8di) __A, + (__v16qi) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtsepi64_storeu_epi8 (void * __P, __mmask8 __M, __m512i __A) +{ + __builtin_ia32_pmovsqb512mem_mask ((__v16qi *) __P, (__v8di) __A, __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtsepi64_epi8 (__m128i __O, __mmask8 __M, __m512i __A) +{ + return (__m128i) __builtin_ia32_pmovsqb512_mask ((__v8di) __A, + (__v16qi) __O, __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtsepi64_epi8 (__mmask8 __M, __m512i __A) +{ + return (__m128i) __builtin_ia32_pmovsqb512_mask ((__v8di) __A, + (__v16qi) + _mm_setzero_si128 (), + __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtusepi64_epi8 (__m512i __A) +{ + return (__m128i) __builtin_ia32_pmovusqb512_mask ((__v8di) __A, + (__v16qi) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtusepi64_storeu_epi8 (void * __P, __mmask8 __M, __m512i __A) +{ + __builtin_ia32_pmovusqb512mem_mask ((__v16qi *) __P, (__v8di) __A, __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtusepi64_epi8 (__m128i __O, __mmask8 __M, __m512i __A) +{ + return (__m128i) __builtin_ia32_pmovusqb512_mask ((__v8di) __A, + (__v16qi) __O, + __M); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtusepi64_epi8 (__mmask8 __M, __m512i __A) +{ + return (__m128i) __builtin_ia32_pmovusqb512_mask ((__v8di) __A, + (__v16qi) + _mm_setzero_si128 (), + __M); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepi32_pd (__m256i __A) +{ + return (__m512d) __builtin_ia32_cvtdq2pd512_mask ((__v8si) __A, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepi32_pd (__m512d __W, __mmask8 __U, __m256i __A) +{ + return (__m512d) __builtin_ia32_cvtdq2pd512_mask ((__v8si) __A, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepi32_pd (__mmask8 __U, __m256i __A) +{ + return (__m512d) __builtin_ia32_cvtdq2pd512_mask ((__v8si) __A, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepu32_pd (__m256i __A) +{ + return (__m512d) __builtin_ia32_cvtudq2pd512_mask ((__v8si) __A, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepu32_pd (__m512d __W, __mmask8 __U, __m256i __A) +{ + return (__m512d) __builtin_ia32_cvtudq2pd512_mask ((__v8si) __A, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepu32_pd (__mmask8 __U, __m256i __A) +{ + return (__m512d) __builtin_ia32_cvtudq2pd512_mask ((__v8si) __A, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundepi32_ps (__m512i __A, const int __R) +{ + return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundepi32_ps (__m512 __W, __mmask16 __U, __m512i __A, + const int __R) +{ + return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A, + (__v16sf) __W, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundepi32_ps (__mmask16 __U, __m512i __A, const int __R) +{ + return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundepu32_ps (__m512i __A, const int __R) +{ + return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundepu32_ps (__m512 __W, __mmask16 __U, __m512i __A, + const int __R) +{ + return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A, + (__v16sf) __W, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundepu32_ps (__mmask16 __U, __m512i __A, const int __R) +{ + return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, __R); +} + +#else +#define _mm512_cvt_roundepi32_ps(A, B) \ + (__m512)__builtin_ia32_cvtdq2ps512_mask((__v16si)(A), (__v16sf)_mm512_undefined_ps(), -1, B) + +#define _mm512_mask_cvt_roundepi32_ps(W, U, A, B) \ + (__m512)__builtin_ia32_cvtdq2ps512_mask((__v16si)(A), W, U, B) + +#define _mm512_maskz_cvt_roundepi32_ps(U, A, B) \ + (__m512)__builtin_ia32_cvtdq2ps512_mask((__v16si)(A), (__v16sf)_mm512_setzero_ps(), U, B) + +#define _mm512_cvt_roundepu32_ps(A, B) \ + (__m512)__builtin_ia32_cvtudq2ps512_mask((__v16si)(A), (__v16sf)_mm512_undefined_ps(), -1, B) + +#define _mm512_mask_cvt_roundepu32_ps(W, U, A, B) \ + (__m512)__builtin_ia32_cvtudq2ps512_mask((__v16si)(A), W, U, B) + +#define _mm512_maskz_cvt_roundepu32_ps(U, A, B) \ + (__m512)__builtin_ia32_cvtudq2ps512_mask((__v16si)(A), (__v16sf)_mm512_setzero_ps(), U, B) +#endif + +#ifdef __OPTIMIZE__ +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_extractf64x4_pd (__m512d __A, const int __imm) +{ + return (__m256d) __builtin_ia32_extractf64x4_mask ((__v8df) __A, + __imm, + (__v4df) + _mm256_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_extractf64x4_pd (__m256d __W, __mmask8 __U, __m512d __A, + const int __imm) +{ + return (__m256d) __builtin_ia32_extractf64x4_mask ((__v8df) __A, + __imm, + (__v4df) __W, + (__mmask8) __U); +} + +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_extractf64x4_pd (__mmask8 __U, __m512d __A, const int __imm) +{ + return (__m256d) __builtin_ia32_extractf64x4_mask ((__v8df) __A, + __imm, + (__v4df) + _mm256_setzero_pd (), + (__mmask8) __U); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_extractf32x4_ps (__m512 __A, const int __imm) +{ + return (__m128) __builtin_ia32_extractf32x4_mask ((__v16sf) __A, + __imm, + (__v4sf) + _mm_undefined_ps (), + (__mmask8) -1); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_extractf32x4_ps (__m128 __W, __mmask8 __U, __m512 __A, + const int __imm) +{ + return (__m128) __builtin_ia32_extractf32x4_mask ((__v16sf) __A, + __imm, + (__v4sf) __W, + (__mmask8) __U); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_extractf32x4_ps (__mmask8 __U, __m512 __A, const int __imm) +{ + return (__m128) __builtin_ia32_extractf32x4_mask ((__v16sf) __A, + __imm, + (__v4sf) + _mm_setzero_ps (), + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_extracti64x4_epi64 (__m512i __A, const int __imm) +{ + return (__m256i) __builtin_ia32_extracti64x4_mask ((__v8di) __A, + __imm, + (__v4di) + _mm256_undefined_si256 (), + (__mmask8) -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_extracti64x4_epi64 (__m256i __W, __mmask8 __U, __m512i __A, + const int __imm) +{ + return (__m256i) __builtin_ia32_extracti64x4_mask ((__v8di) __A, + __imm, + (__v4di) __W, + (__mmask8) __U); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_extracti64x4_epi64 (__mmask8 __U, __m512i __A, const int __imm) +{ + return (__m256i) __builtin_ia32_extracti64x4_mask ((__v8di) __A, + __imm, + (__v4di) + _mm256_setzero_si256 (), + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_extracti32x4_epi32 (__m512i __A, const int __imm) +{ + return (__m128i) __builtin_ia32_extracti32x4_mask ((__v16si) __A, + __imm, + (__v4si) + _mm_undefined_si128 (), + (__mmask8) -1); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_extracti32x4_epi32 (__m128i __W, __mmask8 __U, __m512i __A, + const int __imm) +{ + return (__m128i) __builtin_ia32_extracti32x4_mask ((__v16si) __A, + __imm, + (__v4si) __W, + (__mmask8) __U); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_extracti32x4_epi32 (__mmask8 __U, __m512i __A, const int __imm) +{ + return (__m128i) __builtin_ia32_extracti32x4_mask ((__v16si) __A, + __imm, + (__v4si) + _mm_setzero_si128 (), + (__mmask8) __U); +} +#else + +#define _mm512_extractf64x4_pd(X, C) \ + ((__m256d) __builtin_ia32_extractf64x4_mask ((__v8df)(__m512d) (X), \ + (int) (C),\ + (__v4df)(__m256d)_mm256_undefined_pd(),\ + (__mmask8)-1)) + +#define _mm512_mask_extractf64x4_pd(W, U, X, C) \ + ((__m256d) __builtin_ia32_extractf64x4_mask ((__v8df)(__m512d) (X), \ + (int) (C),\ + (__v4df)(__m256d)(W),\ + (__mmask8)(U))) + +#define _mm512_maskz_extractf64x4_pd(U, X, C) \ + ((__m256d) __builtin_ia32_extractf64x4_mask ((__v8df)(__m512d) (X), \ + (int) (C),\ + (__v4df)(__m256d)_mm256_setzero_pd(),\ + (__mmask8)(U))) + +#define _mm512_extractf32x4_ps(X, C) \ + ((__m128) __builtin_ia32_extractf32x4_mask ((__v16sf)(__m512) (X), \ + (int) (C),\ + (__v4sf)(__m128)_mm_undefined_ps(),\ + (__mmask8)-1)) + +#define _mm512_mask_extractf32x4_ps(W, U, X, C) \ + ((__m128) __builtin_ia32_extractf32x4_mask ((__v16sf)(__m512) (X), \ + (int) (C),\ + (__v4sf)(__m128)(W),\ + (__mmask8)(U))) + +#define _mm512_maskz_extractf32x4_ps(U, X, C) \ + ((__m128) __builtin_ia32_extractf32x4_mask ((__v16sf)(__m512) (X), \ + (int) (C),\ + (__v4sf)(__m128)_mm_setzero_ps(),\ + (__mmask8)(U))) + +#define _mm512_extracti64x4_epi64(X, C) \ + ((__m256i) __builtin_ia32_extracti64x4_mask ((__v8di)(__m512i) (X), \ + (int) (C),\ + (__v4di)(__m256i)_mm256_undefined_si256 (),\ + (__mmask8)-1)) + +#define _mm512_mask_extracti64x4_epi64(W, U, X, C) \ + ((__m256i) __builtin_ia32_extracti64x4_mask ((__v8di)(__m512i) (X), \ + (int) (C),\ + (__v4di)(__m256i)(W),\ + (__mmask8)(U))) + +#define _mm512_maskz_extracti64x4_epi64(U, X, C) \ + ((__m256i) __builtin_ia32_extracti64x4_mask ((__v8di)(__m512i) (X), \ + (int) (C),\ + (__v4di)(__m256i)_mm256_setzero_si256 (),\ + (__mmask8)(U))) + +#define _mm512_extracti32x4_epi32(X, C) \ + ((__m128i) __builtin_ia32_extracti32x4_mask ((__v16si)(__m512i) (X), \ + (int) (C),\ + (__v4si)(__m128i)_mm_undefined_si128 (),\ + (__mmask8)-1)) + +#define _mm512_mask_extracti32x4_epi32(W, U, X, C) \ + ((__m128i) __builtin_ia32_extracti32x4_mask ((__v16si)(__m512i) (X), \ + (int) (C),\ + (__v4si)(__m128i)(W),\ + (__mmask8)(U))) + +#define _mm512_maskz_extracti32x4_epi32(U, X, C) \ + ((__m128i) __builtin_ia32_extracti32x4_mask ((__v16si)(__m512i) (X), \ + (int) (C),\ + (__v4si)(__m128i)_mm_setzero_si128 (),\ + (__mmask8)(U))) +#endif + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_inserti32x4 (__m512i __A, __m128i __B, const int __imm) +{ + return (__m512i) __builtin_ia32_inserti32x4_mask ((__v16si) __A, + (__v4si) __B, + __imm, + (__v16si) __A, -1); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_insertf32x4 (__m512 __A, __m128 __B, const int __imm) +{ + return (__m512) __builtin_ia32_insertf32x4_mask ((__v16sf) __A, + (__v4sf) __B, + __imm, + (__v16sf) __A, -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_inserti64x4 (__m512i __A, __m256i __B, const int __imm) +{ + return (__m512i) __builtin_ia32_inserti64x4_mask ((__v8di) __A, + (__v4di) __B, + __imm, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_inserti64x4 (__m512i __W, __mmask8 __U, __m512i __A, + __m256i __B, const int __imm) +{ + return (__m512i) __builtin_ia32_inserti64x4_mask ((__v8di) __A, + (__v4di) __B, + __imm, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_inserti64x4 (__mmask8 __U, __m512i __A, __m256i __B, + const int __imm) +{ + return (__m512i) __builtin_ia32_inserti64x4_mask ((__v8di) __A, + (__v4di) __B, + __imm, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_insertf64x4 (__m512d __A, __m256d __B, const int __imm) +{ + return (__m512d) __builtin_ia32_insertf64x4_mask ((__v8df) __A, + (__v4df) __B, + __imm, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_insertf64x4 (__m512d __W, __mmask8 __U, __m512d __A, + __m256d __B, const int __imm) +{ + return (__m512d) __builtin_ia32_insertf64x4_mask ((__v8df) __A, + (__v4df) __B, + __imm, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_insertf64x4 (__mmask8 __U, __m512d __A, __m256d __B, + const int __imm) +{ + return (__m512d) __builtin_ia32_insertf64x4_mask ((__v8df) __A, + (__v4df) __B, + __imm, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); +} +#else +#define _mm512_insertf32x4(X, Y, C) \ + ((__m512) __builtin_ia32_insertf32x4_mask ((__v16sf)(__m512) (X), \ + (__v4sf)(__m128) (Y), (int) (C), (__v16sf)(__m512) (X), (__mmask16)(-1))) + +#define _mm512_inserti32x4(X, Y, C) \ + ((__m512i) __builtin_ia32_inserti32x4_mask ((__v16si)(__m512i) (X), \ + (__v4si)(__m128i) (Y), (int) (C), (__v16si)(__m512i) (X), (__mmask16)(-1))) + +#define _mm512_insertf64x4(X, Y, C) \ + ((__m512d) __builtin_ia32_insertf64x4_mask ((__v8df)(__m512d) (X), \ + (__v4df)(__m256d) (Y), (int) (C), \ + (__v8df)(__m512d)_mm512_undefined_pd(), \ + (__mmask8)-1)) + +#define _mm512_mask_insertf64x4(W, U, X, Y, C) \ + ((__m512d) __builtin_ia32_insertf64x4_mask ((__v8df)(__m512d) (X), \ + (__v4df)(__m256d) (Y), (int) (C), \ + (__v8df)(__m512d)(W), \ + (__mmask8)(U))) + +#define _mm512_maskz_insertf64x4(U, X, Y, C) \ + ((__m512d) __builtin_ia32_insertf64x4_mask ((__v8df)(__m512d) (X), \ + (__v4df)(__m256d) (Y), (int) (C), \ + (__v8df)(__m512d)_mm512_setzero_pd(), \ + (__mmask8)(U))) + +#define _mm512_inserti64x4(X, Y, C) \ + ((__m512i) __builtin_ia32_inserti64x4_mask ((__v8di)(__m512i) (X), \ + (__v4di)(__m256i) (Y), (int) (C), \ + (__v8di)(__m512i)_mm512_undefined_si512 (), \ + (__mmask8)-1)) + +#define _mm512_mask_inserti64x4(W, U, X, Y, C) \ + ((__m512i) __builtin_ia32_inserti64x4_mask ((__v8di)(__m512i) (X), \ + (__v4di)(__m256i) (Y), (int) (C),\ + (__v8di)(__m512i)(W),\ + (__mmask8)(U))) + +#define _mm512_maskz_inserti64x4(U, X, Y, C) \ + ((__m512i) __builtin_ia32_inserti64x4_mask ((__v8di)(__m512i) (X), \ + (__v4di)(__m256i) (Y), (int) (C), \ + (__v8di)(__m512i)_mm512_setzero_si512 (), \ + (__mmask8)(U))) +#endif + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_loadu_pd (void const *__P) +{ + return (__m512d) __builtin_ia32_loadupd512_mask ((const __v8df *) __P, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_loadu_pd (__m512d __W, __mmask8 __U, void const *__P) +{ + return (__m512d) __builtin_ia32_loadupd512_mask ((const __v8df *) __P, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_loadu_pd (__mmask8 __U, void const *__P) +{ + return (__m512d) __builtin_ia32_loadupd512_mask ((const __v8df *) __P, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_storeu_pd (void *__P, __m512d __A) +{ + __builtin_ia32_storeupd512_mask ((__v8df *) __P, (__v8df) __A, + (__mmask8) -1); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_storeu_pd (void *__P, __mmask8 __U, __m512d __A) +{ + __builtin_ia32_storeupd512_mask ((__v8df *) __P, (__v8df) __A, + (__mmask8) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_loadu_ps (void const *__P) +{ + return (__m512) __builtin_ia32_loadups512_mask ((const __v16sf *) __P, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_loadu_ps (__m512 __W, __mmask16 __U, void const *__P) +{ + return (__m512) __builtin_ia32_loadups512_mask ((const __v16sf *) __P, + (__v16sf) __W, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_loadu_ps (__mmask16 __U, void const *__P) +{ + return (__m512) __builtin_ia32_loadups512_mask ((const __v16sf *) __P, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_storeu_ps (void *__P, __m512 __A) +{ + __builtin_ia32_storeups512_mask ((__v16sf *) __P, (__v16sf) __A, + (__mmask16) -1); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_storeu_ps (void *__P, __mmask16 __U, __m512 __A) +{ + __builtin_ia32_storeups512_mask ((__v16sf *) __P, (__v16sf) __A, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_loadu_epi64 (__m512i __W, __mmask8 __U, void const *__P) +{ + return (__m512i) __builtin_ia32_loaddqudi512_mask ((const __v8di *) __P, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_loadu_epi64 (__mmask8 __U, void const *__P) +{ + return (__m512i) __builtin_ia32_loaddqudi512_mask ((const __v8di *) __P, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_storeu_epi64 (void *__P, __mmask8 __U, __m512i __A) +{ + __builtin_ia32_storedqudi512_mask ((__v8di *) __P, (__v8di) __A, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_loadu_si512 (void const *__P) +{ + return (__m512i) __builtin_ia32_loaddqusi512_mask ((const __v16si *) __P, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_loadu_epi32 (__m512i __W, __mmask16 __U, void const *__P) +{ + return (__m512i) __builtin_ia32_loaddqusi512_mask ((const __v16si *) __P, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_loadu_epi32 (__mmask16 __U, void const *__P) +{ + return (__m512i) __builtin_ia32_loaddqusi512_mask ((const __v16si *) __P, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_storeu_si512 (void *__P, __m512i __A) +{ + __builtin_ia32_storedqusi512_mask ((__v16si *) __P, (__v16si) __A, + (__mmask16) -1); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_storeu_epi32 (void *__P, __mmask16 __U, __m512i __A) +{ + __builtin_ia32_storedqusi512_mask ((__v16si *) __P, (__v16si) __A, + (__mmask16) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permutevar_pd (__m512d __A, __m512i __C) +{ + return (__m512d) __builtin_ia32_vpermilvarpd512_mask ((__v8df) __A, + (__v8di) __C, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_permutevar_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512i __C) +{ + return (__m512d) __builtin_ia32_vpermilvarpd512_mask ((__v8df) __A, + (__v8di) __C, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_permutevar_pd (__mmask8 __U, __m512d __A, __m512i __C) +{ + return (__m512d) __builtin_ia32_vpermilvarpd512_mask ((__v8df) __A, + (__v8di) __C, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permutevar_ps (__m512 __A, __m512i __C) +{ + return (__m512) __builtin_ia32_vpermilvarps512_mask ((__v16sf) __A, + (__v16si) __C, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_permutevar_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512i __C) +{ + return (__m512) __builtin_ia32_vpermilvarps512_mask ((__v16sf) __A, + (__v16si) __C, + (__v16sf) __W, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_permutevar_ps (__mmask16 __U, __m512 __A, __m512i __C) +{ + return (__m512) __builtin_ia32_vpermilvarps512_mask ((__v16sf) __A, + (__v16si) __C, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permutex2var_epi64 (__m512i __A, __m512i __I, __m512i __B) +{ + return (__m512i) __builtin_ia32_vpermt2varq512_mask ((__v8di) __I + /* idx */ , + (__v8di) __A, + (__v8di) __B, + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_permutex2var_epi64 (__m512i __A, __mmask8 __U, __m512i __I, + __m512i __B) +{ + return (__m512i) __builtin_ia32_vpermt2varq512_mask ((__v8di) __I + /* idx */ , + (__v8di) __A, + (__v8di) __B, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask2_permutex2var_epi64 (__m512i __A, __m512i __I, + __mmask8 __U, __m512i __B) +{ + return (__m512i) __builtin_ia32_vpermi2varq512_mask ((__v8di) __A, + (__v8di) __I + /* idx */ , + (__v8di) __B, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_permutex2var_epi64 (__mmask8 __U, __m512i __A, + __m512i __I, __m512i __B) +{ + return (__m512i) __builtin_ia32_vpermt2varq512_maskz ((__v8di) __I + /* idx */ , + (__v8di) __A, + (__v8di) __B, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permutex2var_epi32 (__m512i __A, __m512i __I, __m512i __B) +{ + return (__m512i) __builtin_ia32_vpermt2vard512_mask ((__v16si) __I + /* idx */ , + (__v16si) __A, + (__v16si) __B, + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_permutex2var_epi32 (__m512i __A, __mmask16 __U, + __m512i __I, __m512i __B) +{ + return (__m512i) __builtin_ia32_vpermt2vard512_mask ((__v16si) __I + /* idx */ , + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask2_permutex2var_epi32 (__m512i __A, __m512i __I, + __mmask16 __U, __m512i __B) +{ + return (__m512i) __builtin_ia32_vpermi2vard512_mask ((__v16si) __A, + (__v16si) __I + /* idx */ , + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_permutex2var_epi32 (__mmask16 __U, __m512i __A, + __m512i __I, __m512i __B) +{ + return (__m512i) __builtin_ia32_vpermt2vard512_maskz ((__v16si) __I + /* idx */ , + (__v16si) __A, + (__v16si) __B, + (__mmask16) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permutex2var_pd (__m512d __A, __m512i __I, __m512d __B) +{ + return (__m512d) __builtin_ia32_vpermt2varpd512_mask ((__v8di) __I + /* idx */ , + (__v8df) __A, + (__v8df) __B, + (__mmask8) -1); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_permutex2var_pd (__m512d __A, __mmask8 __U, __m512i __I, + __m512d __B) +{ + return (__m512d) __builtin_ia32_vpermt2varpd512_mask ((__v8di) __I + /* idx */ , + (__v8df) __A, + (__v8df) __B, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask2_permutex2var_pd (__m512d __A, __m512i __I, __mmask8 __U, + __m512d __B) +{ + return (__m512d) __builtin_ia32_vpermi2varpd512_mask ((__v8df) __A, + (__v8di) __I + /* idx */ , + (__v8df) __B, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_permutex2var_pd (__mmask8 __U, __m512d __A, __m512i __I, + __m512d __B) +{ + return (__m512d) __builtin_ia32_vpermt2varpd512_maskz ((__v8di) __I + /* idx */ , + (__v8df) __A, + (__v8df) __B, + (__mmask8) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permutex2var_ps (__m512 __A, __m512i __I, __m512 __B) +{ + return (__m512) __builtin_ia32_vpermt2varps512_mask ((__v16si) __I + /* idx */ , + (__v16sf) __A, + (__v16sf) __B, + (__mmask16) -1); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_permutex2var_ps (__m512 __A, __mmask16 __U, __m512i __I, __m512 __B) +{ + return (__m512) __builtin_ia32_vpermt2varps512_mask ((__v16si) __I + /* idx */ , + (__v16sf) __A, + (__v16sf) __B, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask2_permutex2var_ps (__m512 __A, __m512i __I, __mmask16 __U, + __m512 __B) +{ + return (__m512) __builtin_ia32_vpermi2varps512_mask ((__v16sf) __A, + (__v16si) __I + /* idx */ , + (__v16sf) __B, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_permutex2var_ps (__mmask16 __U, __m512 __A, __m512i __I, + __m512 __B) +{ + return (__m512) __builtin_ia32_vpermt2varps512_maskz ((__v16si) __I + /* idx */ , + (__v16sf) __A, + (__v16sf) __B, + (__mmask16) __U); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permute_pd (__m512d __X, const int __C) +{ + return (__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df) __X, __C, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_permute_pd (__m512d __W, __mmask8 __U, __m512d __X, const int __C) +{ + return (__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df) __X, __C, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_permute_pd (__mmask8 __U, __m512d __X, const int __C) +{ + return (__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df) __X, __C, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permute_ps (__m512 __X, const int __C) +{ + return (__m512) __builtin_ia32_vpermilps512_mask ((__v16sf) __X, __C, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_permute_ps (__m512 __W, __mmask16 __U, __m512 __X, const int __C) +{ + return (__m512) __builtin_ia32_vpermilps512_mask ((__v16sf) __X, __C, + (__v16sf) __W, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_permute_ps (__mmask16 __U, __m512 __X, const int __C) +{ + return (__m512) __builtin_ia32_vpermilps512_mask ((__v16sf) __X, __C, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); +} +#else +#define _mm512_permute_pd(X, C) \ + ((__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df)(__m512d)(X), (int)(C), \ + (__v8df)(__m512d)_mm512_undefined_pd(),\ + (__mmask8)(-1))) + +#define _mm512_mask_permute_pd(W, U, X, C) \ + ((__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df)(__m512d)(X), (int)(C), \ + (__v8df)(__m512d)(W), \ + (__mmask8)(U))) + +#define _mm512_maskz_permute_pd(U, X, C) \ + ((__m512d) __builtin_ia32_vpermilpd512_mask ((__v8df)(__m512d)(X), (int)(C), \ + (__v8df)(__m512d)_mm512_setzero_pd(), \ + (__mmask8)(U))) + +#define _mm512_permute_ps(X, C) \ + ((__m512) __builtin_ia32_vpermilps512_mask ((__v16sf)(__m512)(X), (int)(C), \ + (__v16sf)(__m512)_mm512_undefined_ps(),\ + (__mmask16)(-1))) + +#define _mm512_mask_permute_ps(W, U, X, C) \ + ((__m512) __builtin_ia32_vpermilps512_mask ((__v16sf)(__m512)(X), (int)(C), \ + (__v16sf)(__m512)(W), \ + (__mmask16)(U))) + +#define _mm512_maskz_permute_ps(U, X, C) \ + ((__m512) __builtin_ia32_vpermilps512_mask ((__v16sf)(__m512)(X), (int)(C), \ + (__v16sf)(__m512)_mm512_setzero_ps(), \ + (__mmask16)(U))) +#endif + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permutex_epi64 (__m512i __X, const int __I) +{ + return (__m512i) __builtin_ia32_permdi512_mask ((__v8di) __X, __I, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) (-1)); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_permutex_epi64 (__m512i __W, __mmask8 __M, + __m512i __X, const int __I) +{ + return (__m512i) __builtin_ia32_permdi512_mask ((__v8di) __X, __I, + (__v8di) __W, + (__mmask8) __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_permutex_epi64 (__mmask8 __M, __m512i __X, const int __I) +{ + return (__m512i) __builtin_ia32_permdi512_mask ((__v8di) __X, __I, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __M); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permutex_pd (__m512d __X, const int __M) +{ + return (__m512d) __builtin_ia32_permdf512_mask ((__v8df) __X, __M, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_permutex_pd (__m512d __W, __mmask8 __U, __m512d __X, const int __M) +{ + return (__m512d) __builtin_ia32_permdf512_mask ((__v8df) __X, __M, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_permutex_pd (__mmask8 __U, __m512d __X, const int __M) +{ + return (__m512d) __builtin_ia32_permdf512_mask ((__v8df) __X, __M, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); +} +#else +#define _mm512_permutex_pd(X, M) \ + ((__m512d) __builtin_ia32_permdf512_mask ((__v8df)(__m512d)(X), (int)(M), \ + (__v8df)(__m512d)_mm512_undefined_pd(),\ + (__mmask8)-1)) + +#define _mm512_mask_permutex_pd(W, U, X, M) \ + ((__m512d) __builtin_ia32_permdf512_mask ((__v8df)(__m512d)(X), (int)(M), \ + (__v8df)(__m512d)(W), (__mmask8)(U))) + +#define _mm512_maskz_permutex_pd(U, X, M) \ + ((__m512d) __builtin_ia32_permdf512_mask ((__v8df)(__m512d)(X), (int)(M), \ + (__v8df)(__m512d)_mm512_setzero_pd(),\ + (__mmask8)(U))) + +#define _mm512_permutex_epi64(X, I) \ + ((__m512i) __builtin_ia32_permdi512_mask ((__v8di)(__m512i)(X), \ + (int)(I), \ + (__v8di)(__m512i) \ + (_mm512_undefined_si512 ()),\ + (__mmask8)(-1))) + +#define _mm512_maskz_permutex_epi64(M, X, I) \ + ((__m512i) __builtin_ia32_permdi512_mask ((__v8di)(__m512i)(X), \ + (int)(I), \ + (__v8di)(__m512i) \ + (_mm512_setzero_si512 ()),\ + (__mmask8)(M))) + +#define _mm512_mask_permutex_epi64(W, M, X, I) \ + ((__m512i) __builtin_ia32_permdi512_mask ((__v8di)(__m512i)(X), \ + (int)(I), \ + (__v8di)(__m512i)(W), \ + (__mmask8)(M))) +#endif + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_permutexvar_epi64 (__mmask8 __M, __m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_permvardi512_mask ((__v8di) __Y, + (__v8di) __X, + (__v8di) + _mm512_setzero_si512 (), + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permutexvar_epi64 (__m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_permvardi512_mask ((__v8di) __Y, + (__v8di) __X, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_permutexvar_epi64 (__m512i __W, __mmask8 __M, __m512i __X, + __m512i __Y) +{ + return (__m512i) __builtin_ia32_permvardi512_mask ((__v8di) __Y, + (__v8di) __X, + (__v8di) __W, + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_permutexvar_epi32 (__mmask16 __M, __m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_permvarsi512_mask ((__v16si) __Y, + (__v16si) __X, + (__v16si) + _mm512_setzero_si512 (), + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permutexvar_epi32 (__m512i __X, __m512i __Y) +{ + return (__m512i) __builtin_ia32_permvarsi512_mask ((__v16si) __Y, + (__v16si) __X, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_permutexvar_epi32 (__m512i __W, __mmask16 __M, __m512i __X, + __m512i __Y) +{ + return (__m512i) __builtin_ia32_permvarsi512_mask ((__v16si) __Y, + (__v16si) __X, + (__v16si) __W, + __M); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permutexvar_pd (__m512i __X, __m512d __Y) +{ + return (__m512d) __builtin_ia32_permvardf512_mask ((__v8df) __Y, + (__v8di) __X, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_permutexvar_pd (__m512d __W, __mmask8 __U, __m512i __X, __m512d __Y) +{ + return (__m512d) __builtin_ia32_permvardf512_mask ((__v8df) __Y, + (__v8di) __X, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_permutexvar_pd (__mmask8 __U, __m512i __X, __m512d __Y) +{ + return (__m512d) __builtin_ia32_permvardf512_mask ((__v8df) __Y, + (__v8di) __X, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permutexvar_ps (__m512i __X, __m512 __Y) +{ + return (__m512) __builtin_ia32_permvarsf512_mask ((__v16sf) __Y, + (__v16si) __X, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_permutexvar_ps (__m512 __W, __mmask16 __U, __m512i __X, __m512 __Y) +{ + return (__m512) __builtin_ia32_permvarsf512_mask ((__v16sf) __Y, + (__v16si) __X, + (__v16sf) __W, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_permutexvar_ps (__mmask16 __U, __m512i __X, __m512 __Y) +{ + return (__m512) __builtin_ia32_permvarsf512_mask ((__v16sf) __Y, + (__v16si) __X, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_shuffle_ps (__m512 __M, __m512 __V, const int __imm) +{ + return (__m512) __builtin_ia32_shufps512_mask ((__v16sf) __M, + (__v16sf) __V, __imm, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_shuffle_ps (__m512 __W, __mmask16 __U, __m512 __M, + __m512 __V, const int __imm) +{ + return (__m512) __builtin_ia32_shufps512_mask ((__v16sf) __M, + (__v16sf) __V, __imm, + (__v16sf) __W, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_shuffle_ps (__mmask16 __U, __m512 __M, __m512 __V, const int __imm) +{ + return (__m512) __builtin_ia32_shufps512_mask ((__v16sf) __M, + (__v16sf) __V, __imm, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_shuffle_pd (__m512d __M, __m512d __V, const int __imm) +{ + return (__m512d) __builtin_ia32_shufpd512_mask ((__v8df) __M, + (__v8df) __V, __imm, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_shuffle_pd (__m512d __W, __mmask8 __U, __m512d __M, + __m512d __V, const int __imm) +{ + return (__m512d) __builtin_ia32_shufpd512_mask ((__v8df) __M, + (__v8df) __V, __imm, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_shuffle_pd (__mmask8 __U, __m512d __M, __m512d __V, + const int __imm) +{ + return (__m512d) __builtin_ia32_shufpd512_mask ((__v8df) __M, + (__v8df) __V, __imm, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fixupimm_round_pd (__m512d __A, __m512d __B, __m512i __C, + const int __imm, const int __R) +{ + return (__m512d) __builtin_ia32_fixupimmpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8di) __C, + __imm, + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fixupimm_round_pd (__m512d __A, __mmask8 __U, __m512d __B, + __m512i __C, const int __imm, const int __R) +{ + return (__m512d) __builtin_ia32_fixupimmpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8di) __C, + __imm, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fixupimm_round_pd (__mmask8 __U, __m512d __A, __m512d __B, + __m512i __C, const int __imm, const int __R) +{ + return (__m512d) __builtin_ia32_fixupimmpd512_maskz ((__v8df) __A, + (__v8df) __B, + (__v8di) __C, + __imm, + (__mmask8) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fixupimm_round_ps (__m512 __A, __m512 __B, __m512i __C, + const int __imm, const int __R) +{ + return (__m512) __builtin_ia32_fixupimmps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16si) __C, + __imm, + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fixupimm_round_ps (__m512 __A, __mmask16 __U, __m512 __B, + __m512i __C, const int __imm, const int __R) +{ + return (__m512) __builtin_ia32_fixupimmps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16si) __C, + __imm, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fixupimm_round_ps (__mmask16 __U, __m512 __A, __m512 __B, + __m512i __C, const int __imm, const int __R) +{ + return (__m512) __builtin_ia32_fixupimmps512_maskz ((__v16sf) __A, + (__v16sf) __B, + (__v16si) __C, + __imm, + (__mmask16) __U, __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fixupimm_round_sd (__m128d __A, __m128d __B, __m128i __C, + const int __imm, const int __R) +{ + return (__m128d) __builtin_ia32_fixupimmsd_mask ((__v2df) __A, + (__v2df) __B, + (__v2di) __C, __imm, + (__mmask8) -1, __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fixupimm_round_sd (__m128d __A, __mmask8 __U, __m128d __B, + __m128i __C, const int __imm, const int __R) +{ + return (__m128d) __builtin_ia32_fixupimmsd_mask ((__v2df) __A, + (__v2df) __B, + (__v2di) __C, __imm, + (__mmask8) __U, __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fixupimm_round_sd (__mmask8 __U, __m128d __A, __m128d __B, + __m128i __C, const int __imm, const int __R) +{ + return (__m128d) __builtin_ia32_fixupimmsd_maskz ((__v2df) __A, + (__v2df) __B, + (__v2di) __C, + __imm, + (__mmask8) __U, __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fixupimm_round_ss (__m128 __A, __m128 __B, __m128i __C, + const int __imm, const int __R) +{ + return (__m128) __builtin_ia32_fixupimmss_mask ((__v4sf) __A, + (__v4sf) __B, + (__v4si) __C, __imm, + (__mmask8) -1, __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fixupimm_round_ss (__m128 __A, __mmask8 __U, __m128 __B, + __m128i __C, const int __imm, const int __R) +{ + return (__m128) __builtin_ia32_fixupimmss_mask ((__v4sf) __A, + (__v4sf) __B, + (__v4si) __C, __imm, + (__mmask8) __U, __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fixupimm_round_ss (__mmask8 __U, __m128 __A, __m128 __B, + __m128i __C, const int __imm, const int __R) +{ + return (__m128) __builtin_ia32_fixupimmss_maskz ((__v4sf) __A, + (__v4sf) __B, + (__v4si) __C, __imm, + (__mmask8) __U, __R); +} + +#else +#define _mm512_shuffle_pd(X, Y, C) \ + ((__m512d)__builtin_ia32_shufpd512_mask ((__v8df)(__m512d)(X), \ + (__v8df)(__m512d)(Y), (int)(C),\ + (__v8df)(__m512d)_mm512_undefined_pd(),\ + (__mmask8)-1)) + +#define _mm512_mask_shuffle_pd(W, U, X, Y, C) \ + ((__m512d)__builtin_ia32_shufpd512_mask ((__v8df)(__m512d)(X), \ + (__v8df)(__m512d)(Y), (int)(C),\ + (__v8df)(__m512d)(W),\ + (__mmask8)(U))) + +#define _mm512_maskz_shuffle_pd(U, X, Y, C) \ + ((__m512d)__builtin_ia32_shufpd512_mask ((__v8df)(__m512d)(X), \ + (__v8df)(__m512d)(Y), (int)(C),\ + (__v8df)(__m512d)_mm512_setzero_pd(),\ + (__mmask8)(U))) + +#define _mm512_shuffle_ps(X, Y, C) \ + ((__m512)__builtin_ia32_shufps512_mask ((__v16sf)(__m512)(X), \ + (__v16sf)(__m512)(Y), (int)(C),\ + (__v16sf)(__m512)_mm512_undefined_ps(),\ + (__mmask16)-1)) + +#define _mm512_mask_shuffle_ps(W, U, X, Y, C) \ + ((__m512)__builtin_ia32_shufps512_mask ((__v16sf)(__m512)(X), \ + (__v16sf)(__m512)(Y), (int)(C),\ + (__v16sf)(__m512)(W),\ + (__mmask16)(U))) + +#define _mm512_maskz_shuffle_ps(U, X, Y, C) \ + ((__m512)__builtin_ia32_shufps512_mask ((__v16sf)(__m512)(X), \ + (__v16sf)(__m512)(Y), (int)(C),\ + (__v16sf)(__m512)_mm512_setzero_ps(),\ + (__mmask16)(U))) + +#define _mm512_fixupimm_round_pd(X, Y, Z, C, R) \ + ((__m512d)__builtin_ia32_fixupimmpd512_mask ((__v8df)(__m512d)(X), \ + (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C), \ + (__mmask8)(-1), (R))) + +#define _mm512_mask_fixupimm_round_pd(X, U, Y, Z, C, R) \ + ((__m512d)__builtin_ia32_fixupimmpd512_mask ((__v8df)(__m512d)(X), \ + (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C), \ + (__mmask8)(U), (R))) + +#define _mm512_maskz_fixupimm_round_pd(U, X, Y, Z, C, R) \ + ((__m512d)__builtin_ia32_fixupimmpd512_maskz ((__v8df)(__m512d)(X), \ + (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C), \ + (__mmask8)(U), (R))) + +#define _mm512_fixupimm_round_ps(X, Y, Z, C, R) \ + ((__m512)__builtin_ia32_fixupimmps512_mask ((__v16sf)(__m512)(X), \ + (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C), \ + (__mmask16)(-1), (R))) + +#define _mm512_mask_fixupimm_round_ps(X, U, Y, Z, C, R) \ + ((__m512)__builtin_ia32_fixupimmps512_mask ((__v16sf)(__m512)(X), \ + (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C), \ + (__mmask16)(U), (R))) + +#define _mm512_maskz_fixupimm_round_ps(U, X, Y, Z, C, R) \ + ((__m512)__builtin_ia32_fixupimmps512_maskz ((__v16sf)(__m512)(X), \ + (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C), \ + (__mmask16)(U), (R))) + +#define _mm_fixupimm_round_sd(X, Y, Z, C, R) \ + ((__m128d)__builtin_ia32_fixupimmsd_mask ((__v2df)(__m128d)(X), \ + (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C), \ + (__mmask8)(-1), (R))) + +#define _mm_mask_fixupimm_round_sd(X, U, Y, Z, C, R) \ + ((__m128d)__builtin_ia32_fixupimmsd_mask ((__v2df)(__m128d)(X), \ + (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C), \ + (__mmask8)(U), (R))) + +#define _mm_maskz_fixupimm_round_sd(U, X, Y, Z, C, R) \ + ((__m128d)__builtin_ia32_fixupimmsd_maskz ((__v2df)(__m128d)(X), \ + (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C), \ + (__mmask8)(U), (R))) + +#define _mm_fixupimm_round_ss(X, Y, Z, C, R) \ + ((__m128)__builtin_ia32_fixupimmss_mask ((__v4sf)(__m128)(X), \ + (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C), \ + (__mmask8)(-1), (R))) + +#define _mm_mask_fixupimm_round_ss(X, U, Y, Z, C, R) \ + ((__m128)__builtin_ia32_fixupimmss_mask ((__v4sf)(__m128)(X), \ + (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C), \ + (__mmask8)(U), (R))) + +#define _mm_maskz_fixupimm_round_ss(U, X, Y, Z, C, R) \ + ((__m128)__builtin_ia32_fixupimmss_maskz ((__v4sf)(__m128)(X), \ + (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C), \ + (__mmask8)(U), (R))) +#endif + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_movehdup_ps (__m512 __A) +{ + return (__m512) __builtin_ia32_movshdup512_mask ((__v16sf) __A, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_movehdup_ps (__m512 __W, __mmask16 __U, __m512 __A) +{ + return (__m512) __builtin_ia32_movshdup512_mask ((__v16sf) __A, + (__v16sf) __W, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_movehdup_ps (__mmask16 __U, __m512 __A) +{ + return (__m512) __builtin_ia32_movshdup512_mask ((__v16sf) __A, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_moveldup_ps (__m512 __A) +{ + return (__m512) __builtin_ia32_movsldup512_mask ((__v16sf) __A, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_moveldup_ps (__m512 __W, __mmask16 __U, __m512 __A) +{ + return (__m512) __builtin_ia32_movsldup512_mask ((__v16sf) __A, + (__v16sf) __W, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_moveldup_ps (__mmask16 __U, __m512 __A) +{ + return (__m512) __builtin_ia32_movsldup512_mask ((__v16sf) __A, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_or_si512 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pord512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_or_epi32 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pord512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_or_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pord512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_or_epi32 (__mmask16 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pord512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_or_epi64 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_porq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_or_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_porq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_or_epi64 (__mmask8 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_porq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_xor_si512 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pxord512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_xor_epi32 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pxord512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_xor_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pxord512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_xor_epi32 (__mmask16 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pxord512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_xor_epi64 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pxorq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_xor_epi64 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pxorq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_xor_epi64 (__mmask16 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pxorq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_rol_epi32 (__m512i __A, const int __B) +{ + return (__m512i) __builtin_ia32_prold512_mask ((__v16si) __A, __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_rol_epi32 (__m512i __W, __mmask16 __U, __m512i __A, const int __B) +{ + return (__m512i) __builtin_ia32_prold512_mask ((__v16si) __A, __B, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_rol_epi32 (__mmask16 __U, __m512i __A, const int __B) +{ + return (__m512i) __builtin_ia32_prold512_mask ((__v16si) __A, __B, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_ror_epi32 (__m512i __A, int __B) +{ + return (__m512i) __builtin_ia32_prord512_mask ((__v16si) __A, __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_ror_epi32 (__m512i __W, __mmask16 __U, __m512i __A, int __B) +{ + return (__m512i) __builtin_ia32_prord512_mask ((__v16si) __A, __B, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_ror_epi32 (__mmask16 __U, __m512i __A, int __B) +{ + return (__m512i) __builtin_ia32_prord512_mask ((__v16si) __A, __B, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_rol_epi64 (__m512i __A, const int __B) +{ + return (__m512i) __builtin_ia32_prolq512_mask ((__v8di) __A, __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_rol_epi64 (__m512i __W, __mmask8 __U, __m512i __A, const int __B) +{ + return (__m512i) __builtin_ia32_prolq512_mask ((__v8di) __A, __B, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_rol_epi64 (__mmask8 __U, __m512i __A, const int __B) +{ + return (__m512i) __builtin_ia32_prolq512_mask ((__v8di) __A, __B, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_ror_epi64 (__m512i __A, int __B) +{ + return (__m512i) __builtin_ia32_prorq512_mask ((__v8di) __A, __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_ror_epi64 (__m512i __W, __mmask8 __U, __m512i __A, int __B) +{ + return (__m512i) __builtin_ia32_prorq512_mask ((__v8di) __A, __B, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_ror_epi64 (__mmask8 __U, __m512i __A, int __B) +{ + return (__m512i) __builtin_ia32_prorq512_mask ((__v8di) __A, __B, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +#else +#define _mm512_rol_epi32(A, B) \ + ((__m512i)__builtin_ia32_prold512_mask ((__v16si)(__m512i)(A), \ + (int)(B), \ + (__v16si)_mm512_undefined_si512 (), \ + (__mmask16)(-1))) +#define _mm512_mask_rol_epi32(W, U, A, B) \ + ((__m512i)__builtin_ia32_prold512_mask ((__v16si)(__m512i)(A), \ + (int)(B), \ + (__v16si)(__m512i)(W), \ + (__mmask16)(U))) +#define _mm512_maskz_rol_epi32(U, A, B) \ + ((__m512i)__builtin_ia32_prold512_mask ((__v16si)(__m512i)(A), \ + (int)(B), \ + (__v16si)_mm512_setzero_si512 (), \ + (__mmask16)(U))) +#define _mm512_ror_epi32(A, B) \ + ((__m512i)__builtin_ia32_prord512_mask ((__v16si)(__m512i)(A), \ + (int)(B), \ + (__v16si)_mm512_undefined_si512 (), \ + (__mmask16)(-1))) +#define _mm512_mask_ror_epi32(W, U, A, B) \ + ((__m512i)__builtin_ia32_prord512_mask ((__v16si)(__m512i)(A), \ + (int)(B), \ + (__v16si)(__m512i)(W), \ + (__mmask16)(U))) +#define _mm512_maskz_ror_epi32(U, A, B) \ + ((__m512i)__builtin_ia32_prord512_mask ((__v16si)(__m512i)(A), \ + (int)(B), \ + (__v16si)_mm512_setzero_si512 (), \ + (__mmask16)(U))) +#define _mm512_rol_epi64(A, B) \ + ((__m512i)__builtin_ia32_prolq512_mask ((__v8di)(__m512i)(A), \ + (int)(B), \ + (__v8di)_mm512_undefined_si512 (), \ + (__mmask8)(-1))) +#define _mm512_mask_rol_epi64(W, U, A, B) \ + ((__m512i)__builtin_ia32_prolq512_mask ((__v8di)(__m512i)(A), \ + (int)(B), \ + (__v8di)(__m512i)(W), \ + (__mmask8)(U))) +#define _mm512_maskz_rol_epi64(U, A, B) \ + ((__m512i)__builtin_ia32_prolq512_mask ((__v8di)(__m512i)(A), \ + (int)(B), \ + (__v8di)_mm512_setzero_si512 (), \ + (__mmask8)(U))) + +#define _mm512_ror_epi64(A, B) \ + ((__m512i)__builtin_ia32_prorq512_mask ((__v8di)(__m512i)(A), \ + (int)(B), \ + (__v8di)_mm512_undefined_si512 (), \ + (__mmask8)(-1))) +#define _mm512_mask_ror_epi64(W, U, A, B) \ + ((__m512i)__builtin_ia32_prorq512_mask ((__v8di)(__m512i)(A), \ + (int)(B), \ + (__v8di)(__m512i)(W), \ + (__mmask8)(U))) +#define _mm512_maskz_ror_epi64(U, A, B) \ + ((__m512i)__builtin_ia32_prorq512_mask ((__v8di)(__m512i)(A), \ + (int)(B), \ + (__v8di)_mm512_setzero_si512 (), \ + (__mmask8)(U))) +#endif + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_and_si512 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pandd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_and_epi32 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pandd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_and_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pandd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_and_epi32 (__mmask16 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pandd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_and_epi64 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pandq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_and_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pandq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) __W, __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_and_epi64 (__mmask8 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pandq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_setzero_pd (), + __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_andnot_si512 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pandnd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_andnot_epi32 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pandnd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_andnot_epi32 (__m512i __W, __mmask16 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pandnd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_andnot_epi32 (__mmask16 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pandnd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_andnot_epi64 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pandnq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_andnot_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pandnq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) __W, __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_andnot_epi64 (__mmask8 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pandnq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_setzero_pd (), + __U); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_test_epi32_mask (__m512i __A, __m512i __B) +{ + return (__mmask16) __builtin_ia32_ptestmd512 ((__v16si) __A, + (__v16si) __B, + (__mmask16) -1); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_test_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B) +{ + return (__mmask16) __builtin_ia32_ptestmd512 ((__v16si) __A, + (__v16si) __B, __U); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_test_epi64_mask (__m512i __A, __m512i __B) +{ + return (__mmask8) __builtin_ia32_ptestmq512 ((__v8di) __A, + (__v8di) __B, + (__mmask8) -1); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_test_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B) +{ + return (__mmask8) __builtin_ia32_ptestmq512 ((__v8di) __A, (__v8di) __B, __U); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_testn_epi32_mask (__m512i __A, __m512i __B) +{ + return (__mmask16) __builtin_ia32_ptestnmd512 ((__v16si) __A, + (__v16si) __B, + (__mmask16) -1); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_testn_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B) +{ + return (__mmask16) __builtin_ia32_ptestnmd512 ((__v16si) __A, + (__v16si) __B, __U); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_testn_epi64_mask (__m512i __A, __m512i __B) +{ + return (__mmask8) __builtin_ia32_ptestnmq512 ((__v8di) __A, + (__v8di) __B, + (__mmask8) -1); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_testn_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B) +{ + return (__mmask8) __builtin_ia32_ptestnmq512 ((__v8di) __A, + (__v8di) __B, __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_unpackhi_epi32 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_punpckhdq512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_unpackhi_epi32 (__m512i __W, __mmask16 __U, __m512i __A, + __m512i __B) +{ + return (__m512i) __builtin_ia32_punpckhdq512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_unpackhi_epi32 (__mmask16 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_punpckhdq512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_unpackhi_epi64 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_punpckhqdq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_unpackhi_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_punpckhqdq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_unpackhi_epi64 (__mmask8 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_punpckhqdq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_unpacklo_epi32 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_punpckldq512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_unpacklo_epi32 (__m512i __W, __mmask16 __U, __m512i __A, + __m512i __B) +{ + return (__m512i) __builtin_ia32_punpckldq512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_unpacklo_epi32 (__mmask16 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_punpckldq512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_unpacklo_epi64 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_punpcklqdq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_unpacklo_epi64 (__m512i __W, __mmask8 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_punpcklqdq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_unpacklo_epi64 (__mmask8 __U, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_punpcklqdq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +#ifdef __x86_64__ +#ifdef __OPTIMIZE__ +extern __inline unsigned long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundss_u64 (__m128 __A, const int __R) +{ + return (unsigned long long) __builtin_ia32_vcvtss2usi64 ((__v4sf) __A, __R); +} + +extern __inline long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundss_si64 (__m128 __A, const int __R) +{ + return (long long) __builtin_ia32_vcvtss2si64 ((__v4sf) __A, __R); +} + +extern __inline long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundss_i64 (__m128 __A, const int __R) +{ + return (long long) __builtin_ia32_vcvtss2si64 ((__v4sf) __A, __R); +} + +extern __inline unsigned long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtt_roundss_u64 (__m128 __A, const int __R) +{ + return (unsigned long long) __builtin_ia32_vcvttss2usi64 ((__v4sf) __A, __R); +} + +extern __inline long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtt_roundss_i64 (__m128 __A, const int __R) +{ + return (long long) __builtin_ia32_vcvttss2si64 ((__v4sf) __A, __R); +} + +extern __inline long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtt_roundss_si64 (__m128 __A, const int __R) +{ + return (long long) __builtin_ia32_vcvttss2si64 ((__v4sf) __A, __R); +} +#else +#define _mm_cvt_roundss_u64(A, B) \ + ((unsigned long long)__builtin_ia32_vcvtss2usi64(A, B)) + +#define _mm_cvt_roundss_si64(A, B) \ + ((long long)__builtin_ia32_vcvtss2si64(A, B)) + +#define _mm_cvt_roundss_i64(A, B) \ + ((long long)__builtin_ia32_vcvtss2si64(A, B)) + +#define _mm_cvtt_roundss_u64(A, B) \ + ((unsigned long long)__builtin_ia32_vcvttss2usi64(A, B)) + +#define _mm_cvtt_roundss_i64(A, B) \ + ((long long)__builtin_ia32_vcvttss2si64(A, B)) + +#define _mm_cvtt_roundss_si64(A, B) \ + ((long long)__builtin_ia32_vcvttss2si64(A, B)) +#endif +#endif + +#ifdef __OPTIMIZE__ +extern __inline unsigned +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundss_u32 (__m128 __A, const int __R) +{ + return (unsigned) __builtin_ia32_vcvtss2usi32 ((__v4sf) __A, __R); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundss_si32 (__m128 __A, const int __R) +{ + return (int) __builtin_ia32_vcvtss2si32 ((__v4sf) __A, __R); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundss_i32 (__m128 __A, const int __R) +{ + return (int) __builtin_ia32_vcvtss2si32 ((__v4sf) __A, __R); +} + +extern __inline unsigned +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtt_roundss_u32 (__m128 __A, const int __R) +{ + return (unsigned) __builtin_ia32_vcvttss2usi32 ((__v4sf) __A, __R); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtt_roundss_i32 (__m128 __A, const int __R) +{ + return (int) __builtin_ia32_vcvttss2si32 ((__v4sf) __A, __R); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtt_roundss_si32 (__m128 __A, const int __R) +{ + return (int) __builtin_ia32_vcvttss2si32 ((__v4sf) __A, __R); +} +#else +#define _mm_cvt_roundss_u32(A, B) \ + ((unsigned)__builtin_ia32_vcvtss2usi32(A, B)) + +#define _mm_cvt_roundss_si32(A, B) \ + ((int)__builtin_ia32_vcvtss2si32(A, B)) + +#define _mm_cvt_roundss_i32(A, B) \ + ((int)__builtin_ia32_vcvtss2si32(A, B)) + +#define _mm_cvtt_roundss_u32(A, B) \ + ((unsigned)__builtin_ia32_vcvttss2usi32(A, B)) + +#define _mm_cvtt_roundss_si32(A, B) \ + ((int)__builtin_ia32_vcvttss2si32(A, B)) + +#define _mm_cvtt_roundss_i32(A, B) \ + ((int)__builtin_ia32_vcvttss2si32(A, B)) +#endif + +#ifdef __x86_64__ +#ifdef __OPTIMIZE__ +extern __inline unsigned long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundsd_u64 (__m128d __A, const int __R) +{ + return (unsigned long long) __builtin_ia32_vcvtsd2usi64 ((__v2df) __A, __R); +} + +extern __inline long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundsd_si64 (__m128d __A, const int __R) +{ + return (long long) __builtin_ia32_vcvtsd2si64 ((__v2df) __A, __R); +} + +extern __inline long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundsd_i64 (__m128d __A, const int __R) +{ + return (long long) __builtin_ia32_vcvtsd2si64 ((__v2df) __A, __R); +} + +extern __inline unsigned long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtt_roundsd_u64 (__m128d __A, const int __R) +{ + return (unsigned long long) __builtin_ia32_vcvttsd2usi64 ((__v2df) __A, __R); +} + +extern __inline long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtt_roundsd_si64 (__m128d __A, const int __R) +{ + return (long long) __builtin_ia32_vcvttsd2si64 ((__v2df) __A, __R); +} + +extern __inline long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtt_roundsd_i64 (__m128d __A, const int __R) +{ + return (long long) __builtin_ia32_vcvttsd2si64 ((__v2df) __A, __R); +} +#else +#define _mm_cvt_roundsd_u64(A, B) \ + ((unsigned long long)__builtin_ia32_vcvtsd2usi64(A, B)) + +#define _mm_cvt_roundsd_si64(A, B) \ + ((long long)__builtin_ia32_vcvtsd2si64(A, B)) + +#define _mm_cvt_roundsd_i64(A, B) \ + ((long long)__builtin_ia32_vcvtsd2si64(A, B)) + +#define _mm_cvtt_roundsd_u64(A, B) \ + ((unsigned long long)__builtin_ia32_vcvttsd2usi64(A, B)) + +#define _mm_cvtt_roundsd_si64(A, B) \ + ((long long)__builtin_ia32_vcvttsd2si64(A, B)) + +#define _mm_cvtt_roundsd_i64(A, B) \ + ((long long)__builtin_ia32_vcvttsd2si64(A, B)) +#endif +#endif + +#ifdef __OPTIMIZE__ +extern __inline unsigned +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundsd_u32 (__m128d __A, const int __R) +{ + return (unsigned) __builtin_ia32_vcvtsd2usi32 ((__v2df) __A, __R); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundsd_si32 (__m128d __A, const int __R) +{ + return (int) __builtin_ia32_vcvtsd2si32 ((__v2df) __A, __R); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundsd_i32 (__m128d __A, const int __R) +{ + return (int) __builtin_ia32_vcvtsd2si32 ((__v2df) __A, __R); +} + +extern __inline unsigned +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtt_roundsd_u32 (__m128d __A, const int __R) +{ + return (unsigned) __builtin_ia32_vcvttsd2usi32 ((__v2df) __A, __R); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtt_roundsd_i32 (__m128d __A, const int __R) +{ + return (int) __builtin_ia32_vcvttsd2si32 ((__v2df) __A, __R); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtt_roundsd_si32 (__m128d __A, const int __R) +{ + return (int) __builtin_ia32_vcvttsd2si32 ((__v2df) __A, __R); +} +#else +#define _mm_cvt_roundsd_u32(A, B) \ + ((unsigned)__builtin_ia32_vcvtsd2usi32(A, B)) + +#define _mm_cvt_roundsd_si32(A, B) \ + ((int)__builtin_ia32_vcvtsd2si32(A, B)) + +#define _mm_cvt_roundsd_i32(A, B) \ + ((int)__builtin_ia32_vcvtsd2si32(A, B)) + +#define _mm_cvtt_roundsd_u32(A, B) \ + ((unsigned)__builtin_ia32_vcvttsd2usi32(A, B)) + +#define _mm_cvtt_roundsd_si32(A, B) \ + ((int)__builtin_ia32_vcvttsd2si32(A, B)) + +#define _mm_cvtt_roundsd_i32(A, B) \ + ((int)__builtin_ia32_vcvttsd2si32(A, B)) +#endif + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_movedup_pd (__m512d __A) +{ + return (__m512d) __builtin_ia32_movddup512_mask ((__v8df) __A, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_movedup_pd (__m512d __W, __mmask8 __U, __m512d __A) +{ + return (__m512d) __builtin_ia32_movddup512_mask ((__v8df) __A, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_movedup_pd (__mmask8 __U, __m512d __A) +{ + return (__m512d) __builtin_ia32_movddup512_mask ((__v8df) __A, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_unpacklo_pd (__m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_unpcklpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_unpacklo_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_unpcklpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_unpacklo_pd (__mmask8 __U, __m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_unpcklpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_unpackhi_pd (__m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_unpckhpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_unpackhi_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_unpckhpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_unpackhi_pd (__mmask8 __U, __m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_unpckhpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_unpackhi_ps (__m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_unpckhps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_unpackhi_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_unpckhps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __W, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_unpackhi_ps (__mmask16 __U, __m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_unpckhps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundps_pd (__m256 __A, const int __R) +{ + return (__m512d) __builtin_ia32_cvtps2pd512_mask ((__v8sf) __A, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundps_pd (__m512d __W, __mmask8 __U, __m256 __A, + const int __R) +{ + return (__m512d) __builtin_ia32_cvtps2pd512_mask ((__v8sf) __A, + (__v8df) __W, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundps_pd (__mmask8 __U, __m256 __A, const int __R) +{ + return (__m512d) __builtin_ia32_cvtps2pd512_mask ((__v8sf) __A, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundph_ps (__m256i __A, const int __R) +{ + return (__m512) __builtin_ia32_vcvtph2ps512_mask ((__v16hi) __A, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundph_ps (__m512 __W, __mmask16 __U, __m256i __A, + const int __R) +{ + return (__m512) __builtin_ia32_vcvtph2ps512_mask ((__v16hi) __A, + (__v16sf) __W, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundph_ps (__mmask16 __U, __m256i __A, const int __R) +{ + return (__m512) __builtin_ia32_vcvtph2ps512_mask ((__v16hi) __A, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, __R); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundps_ph (__m512 __A, const int __I) +{ + return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A, + __I, + (__v16hi) + _mm256_undefined_si256 (), + -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtps_ph (__m512 __A, const int __I) +{ + return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A, + __I, + (__v16hi) + _mm256_undefined_si256 (), + -1); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundps_ph (__m256i __U, __mmask16 __W, __m512 __A, + const int __I) +{ + return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A, + __I, + (__v16hi) __U, + (__mmask16) __W); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtps_ph (__m256i __U, __mmask16 __W, __m512 __A, const int __I) +{ + return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A, + __I, + (__v16hi) __U, + (__mmask16) __W); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundps_ph (__mmask16 __W, __m512 __A, const int __I) +{ + return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A, + __I, + (__v16hi) + _mm256_setzero_si256 (), + (__mmask16) __W); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtps_ph (__mmask16 __W, __m512 __A, const int __I) +{ + return (__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf) __A, + __I, + (__v16hi) + _mm256_setzero_si256 (), + (__mmask16) __W); +} +#else +#define _mm512_cvt_roundps_pd(A, B) \ + (__m512d)__builtin_ia32_cvtps2pd512_mask(A, (__v8df)_mm512_undefined_pd(), -1, B) + +#define _mm512_mask_cvt_roundps_pd(W, U, A, B) \ + (__m512d)__builtin_ia32_cvtps2pd512_mask(A, (__v8df)(W), U, B) + +#define _mm512_maskz_cvt_roundps_pd(U, A, B) \ + (__m512d)__builtin_ia32_cvtps2pd512_mask(A, (__v8df)_mm512_setzero_pd(), U, B) + +#define _mm512_cvt_roundph_ps(A, B) \ + (__m512)__builtin_ia32_vcvtph2ps512_mask((__v16hi)(A), (__v16sf)_mm512_undefined_ps(), -1, B) + +#define _mm512_mask_cvt_roundph_ps(W, U, A, B) \ + (__m512)__builtin_ia32_vcvtph2ps512_mask((__v16hi)(A), (__v16sf)(W), U, B) + +#define _mm512_maskz_cvt_roundph_ps(U, A, B) \ + (__m512)__builtin_ia32_vcvtph2ps512_mask((__v16hi)(A), (__v16sf)_mm512_setzero_ps(), U, B) + +#define _mm512_cvt_roundps_ph(A, I) \ + ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) A, (int) (I),\ + (__v16hi)_mm256_undefined_si256 (), -1)) +#define _mm512_cvtps_ph(A, I) \ + ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) A, (int) (I),\ + (__v16hi)_mm256_undefined_si256 (), -1)) +#define _mm512_mask_cvt_roundps_ph(U, W, A, I) \ + ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) A, (int) (I),\ + (__v16hi)(__m256i)(U), (__mmask16) (W))) +#define _mm512_mask_cvtps_ph(U, W, A, I) \ + ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) A, (int) (I),\ + (__v16hi)(__m256i)(U), (__mmask16) (W))) +#define _mm512_maskz_cvt_roundps_ph(W, A, I) \ + ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) A, (int) (I),\ + (__v16hi)_mm256_setzero_si256 (), (__mmask16) (W))) +#define _mm512_maskz_cvtps_ph(W, A, I) \ + ((__m256i) __builtin_ia32_vcvtps2ph512_mask ((__v16sf)(__m512) A, (int) (I),\ + (__v16hi)_mm256_setzero_si256 (), (__mmask16) (W))) +#endif + +#ifdef __OPTIMIZE__ +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvt_roundpd_ps (__m512d __A, const int __R) +{ + return (__m256) __builtin_ia32_cvtpd2ps512_mask ((__v8df) __A, + (__v8sf) + _mm256_undefined_ps (), + (__mmask8) -1, __R); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvt_roundpd_ps (__m256 __W, __mmask8 __U, __m512d __A, + const int __R) +{ + return (__m256) __builtin_ia32_cvtpd2ps512_mask ((__v8df) __A, + (__v8sf) __W, + (__mmask8) __U, __R); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvt_roundpd_ps (__mmask8 __U, __m512d __A, const int __R) +{ + return (__m256) __builtin_ia32_cvtpd2ps512_mask ((__v8df) __A, + (__v8sf) + _mm256_setzero_ps (), + (__mmask8) __U, __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundsd_ss (__m128 __A, __m128d __B, const int __R) +{ + return (__m128) __builtin_ia32_cvtsd2ss_round ((__v4sf) __A, + (__v2df) __B, + __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_roundss_sd (__m128d __A, __m128 __B, const int __R) +{ + return (__m128d) __builtin_ia32_cvtss2sd_round ((__v2df) __A, + (__v4sf) __B, + __R); +} +#else +#define _mm512_cvt_roundpd_ps(A, B) \ + (__m256)__builtin_ia32_cvtpd2ps512_mask(A, (__v8sf)_mm256_undefined_ps(), -1, B) + +#define _mm512_mask_cvt_roundpd_ps(W, U, A, B) \ + (__m256)__builtin_ia32_cvtpd2ps512_mask(A, (__v8sf)(W), U, B) + +#define _mm512_maskz_cvt_roundpd_ps(U, A, B) \ + (__m256)__builtin_ia32_cvtpd2ps512_mask(A, (__v8sf)_mm256_setzero_ps(), U, B) + +#define _mm_cvt_roundsd_ss(A, B, C) \ + (__m128)__builtin_ia32_cvtsd2ss_round(A, B, C) + +#define _mm_cvt_roundss_sd(A, B, C) \ + (__m128d)__builtin_ia32_cvtss2sd_round(A, B, C) +#endif + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_stream_si512 (__m512i * __P, __m512i __A) +{ + __builtin_ia32_movntdq512 ((__v8di *) __P, (__v8di) __A); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_stream_ps (float *__P, __m512 __A) +{ + __builtin_ia32_movntps512 (__P, (__v16sf) __A); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_stream_pd (double *__P, __m512d __A) +{ + __builtin_ia32_movntpd512 (__P, (__v8df) __A); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_stream_load_si512 (void *__P) +{ + return __builtin_ia32_movntdqa512 ((__v8di *)__P); +} + +/* Constants for mantissa extraction */ +typedef enum +{ + _MM_MANT_NORM_1_2, /* interval [1, 2) */ + _MM_MANT_NORM_p5_2, /* interval [0.5, 2) */ + _MM_MANT_NORM_p5_1, /* interval [0.5, 1) */ + _MM_MANT_NORM_p75_1p5 /* interval [0.75, 1.5) */ +} _MM_MANTISSA_NORM_ENUM; + +typedef enum +{ + _MM_MANT_SIGN_src, /* sign = sign(SRC) */ + _MM_MANT_SIGN_zero, /* sign = 0 */ + _MM_MANT_SIGN_nan /* DEST = NaN if sign(SRC) = 1 */ +} _MM_MANTISSA_SIGN_ENUM; + +#ifdef __OPTIMIZE__ +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_getexp_round_ss (__m128 __A, __m128 __B, const int __R) +{ + return (__m128) __builtin_ia32_getexpss128_round ((__v4sf) __A, + (__v4sf) __B, + __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_getexp_round_sd (__m128d __A, __m128d __B, const int __R) +{ + return (__m128d) __builtin_ia32_getexpsd128_round ((__v2df) __A, + (__v2df) __B, + __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_getexp_round_ps (__m512 __A, const int __R) +{ + return (__m512) __builtin_ia32_getexpps512_mask ((__v16sf) __A, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_getexp_round_ps (__m512 __W, __mmask16 __U, __m512 __A, + const int __R) +{ + return (__m512) __builtin_ia32_getexpps512_mask ((__v16sf) __A, + (__v16sf) __W, + (__mmask16) __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_getexp_round_ps (__mmask16 __U, __m512 __A, const int __R) +{ + return (__m512) __builtin_ia32_getexpps512_mask ((__v16sf) __A, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_getexp_round_pd (__m512d __A, const int __R) +{ + return (__m512d) __builtin_ia32_getexppd512_mask ((__v8df) __A, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_getexp_round_pd (__m512d __W, __mmask8 __U, __m512d __A, + const int __R) +{ + return (__m512d) __builtin_ia32_getexppd512_mask ((__v8df) __A, + (__v8df) __W, + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_getexp_round_pd (__mmask8 __U, __m512d __A, const int __R) +{ + return (__m512d) __builtin_ia32_getexppd512_mask ((__v8df) __A, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_getmant_round_pd (__m512d __A, _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C, const int __R) +{ + return (__m512d) __builtin_ia32_getmantpd512_mask ((__v8df) __A, + (__C << 2) | __B, + _mm512_undefined_pd (), + (__mmask8) -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_getmant_round_pd (__m512d __W, __mmask8 __U, __m512d __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C, const int __R) +{ + return (__m512d) __builtin_ia32_getmantpd512_mask ((__v8df) __A, + (__C << 2) | __B, + (__v8df) __W, __U, + __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_getmant_round_pd (__mmask8 __U, __m512d __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C, const int __R) +{ + return (__m512d) __builtin_ia32_getmantpd512_mask ((__v8df) __A, + (__C << 2) | __B, + (__v8df) + _mm512_setzero_pd (), + __U, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_getmant_round_ps (__m512 __A, _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C, const int __R) +{ + return (__m512) __builtin_ia32_getmantps512_mask ((__v16sf) __A, + (__C << 2) | __B, + _mm512_undefined_ps (), + (__mmask16) -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_getmant_round_ps (__m512 __W, __mmask16 __U, __m512 __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C, const int __R) +{ + return (__m512) __builtin_ia32_getmantps512_mask ((__v16sf) __A, + (__C << 2) | __B, + (__v16sf) __W, __U, + __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_getmant_round_ps (__mmask16 __U, __m512 __A, + _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C, const int __R) +{ + return (__m512) __builtin_ia32_getmantps512_mask ((__v16sf) __A, + (__C << 2) | __B, + (__v16sf) + _mm512_setzero_ps (), + __U, __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_getmant_round_sd (__m128d __A, __m128d __B, + _MM_MANTISSA_NORM_ENUM __C, + _MM_MANTISSA_SIGN_ENUM __D, const int __R) +{ + return (__m128d) __builtin_ia32_getmantsd_round ((__v2df) __A, + (__v2df) __B, + (__D << 2) | __C, + __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_getmant_round_ss (__m128 __A, __m128 __B, + _MM_MANTISSA_NORM_ENUM __C, + _MM_MANTISSA_SIGN_ENUM __D, const int __R) +{ + return (__m128) __builtin_ia32_getmantss_round ((__v4sf) __A, + (__v4sf) __B, + (__D << 2) | __C, + __R); +} + +#else +#define _mm512_getmant_round_pd(X, B, C, R) \ + ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X), \ + (int)(((C)<<2) | (B)), \ + (__v8df)(__m512d)_mm512_undefined_pd(), \ + (__mmask8)-1,\ + (R))) + +#define _mm512_mask_getmant_round_pd(W, U, X, B, C, R) \ + ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X), \ + (int)(((C)<<2) | (B)), \ + (__v8df)(__m512d)(W), \ + (__mmask8)(U),\ + (R))) + +#define _mm512_maskz_getmant_round_pd(U, X, B, C, R) \ + ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X), \ + (int)(((C)<<2) | (B)), \ + (__v8df)(__m512d)_mm512_setzero_pd(), \ + (__mmask8)(U),\ + (R))) +#define _mm512_getmant_round_ps(X, B, C, R) \ + ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X), \ + (int)(((C)<<2) | (B)), \ + (__v16sf)(__m512)_mm512_undefined_ps(), \ + (__mmask16)-1,\ + (R))) + +#define _mm512_mask_getmant_round_ps(W, U, X, B, C, R) \ + ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X), \ + (int)(((C)<<2) | (B)), \ + (__v16sf)(__m512)(W), \ + (__mmask16)(U),\ + (R))) + +#define _mm512_maskz_getmant_round_ps(U, X, B, C, R) \ + ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X), \ + (int)(((C)<<2) | (B)), \ + (__v16sf)(__m512)_mm512_setzero_ps(), \ + (__mmask16)(U),\ + (R))) +#define _mm_getmant_round_sd(X, Y, C, D, R) \ + ((__m128d)__builtin_ia32_getmantsd_round ((__v2df)(__m128d)(X), \ + (__v2df)(__m128d)(Y), \ + (int)(((D)<<2) | (C)), \ + (R))) + +#define _mm_getmant_round_ss(X, Y, C, D, R) \ + ((__m128)__builtin_ia32_getmantss_round ((__v4sf)(__m128)(X), \ + (__v4sf)(__m128)(Y), \ + (int)(((D)<<2) | (C)), \ + (R))) + +#define _mm_getexp_round_ss(A, B, R) \ + ((__m128)__builtin_ia32_getexpss128_round((__v4sf)(__m128)(A), (__v4sf)(__m128)(B), R)) + +#define _mm_getexp_round_sd(A, B, R) \ + ((__m128d)__builtin_ia32_getexpsd128_round((__v2df)(__m128d)(A), (__v2df)(__m128d)(B), R)) + +#define _mm512_getexp_round_ps(A, R) \ + ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A), \ + (__v16sf)_mm512_undefined_ps(), (__mmask16)-1, R)) + +#define _mm512_mask_getexp_round_ps(W, U, A, R) \ + ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A), \ + (__v16sf)(__m512)(W), (__mmask16)(U), R)) + +#define _mm512_maskz_getexp_round_ps(U, A, R) \ + ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A), \ + (__v16sf)_mm512_setzero_ps(), (__mmask16)(U), R)) + +#define _mm512_getexp_round_pd(A, R) \ + ((__m512d)__builtin_ia32_getexppd512_mask((__v8df)(__m512d)(A), \ + (__v8df)_mm512_undefined_pd(), (__mmask8)-1, R)) + +#define _mm512_mask_getexp_round_pd(W, U, A, R) \ + ((__m512d)__builtin_ia32_getexppd512_mask((__v8df)(__m512d)(A), \ + (__v8df)(__m512d)(W), (__mmask8)(U), R)) + +#define _mm512_maskz_getexp_round_pd(U, A, R) \ + ((__m512d)__builtin_ia32_getexppd512_mask((__v8df)(__m512d)(A), \ + (__v8df)_mm512_setzero_pd(), (__mmask8)(U), R)) +#endif + +#ifdef __OPTIMIZE__ +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_roundscale_round_ps (__m512 __A, const int __imm, const int __R) +{ + return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A, __imm, + (__v16sf) + _mm512_undefined_ps (), + -1, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_roundscale_round_ps (__m512 __A, __mmask16 __B, __m512 __C, + const int __imm, const int __R) +{ + return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __C, __imm, + (__v16sf) __A, + (__mmask16) __B, __R); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_roundscale_round_ps (__mmask16 __A, __m512 __B, + const int __imm, const int __R) +{ + return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __B, + __imm, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __A, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_roundscale_round_pd (__m512d __A, const int __imm, const int __R) +{ + return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A, __imm, + (__v8df) + _mm512_undefined_pd (), + -1, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_roundscale_round_pd (__m512d __A, __mmask8 __B, + __m512d __C, const int __imm, const int __R) +{ + return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __C, __imm, + (__v8df) __A, + (__mmask8) __B, __R); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_roundscale_round_pd (__mmask8 __A, __m512d __B, + const int __imm, const int __R) +{ + return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __B, + __imm, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __A, __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_roundscale_round_ss (__m128 __A, __m128 __B, const int __imm, const int __R) +{ + return (__m128) __builtin_ia32_rndscaless_round ((__v4sf) __A, + (__v4sf) __B, __imm, __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_roundscale_round_sd (__m128d __A, __m128d __B, const int __imm, + const int __R) +{ + return (__m128d) __builtin_ia32_rndscalesd_round ((__v2df) __A, + (__v2df) __B, __imm, __R); +} + +#else +#define _mm512_roundscale_round_ps(A, B, R) \ + ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(A), (int)(B),\ + (__v16sf)_mm512_undefined_ps(), (__mmask16)(-1), R)) +#define _mm512_mask_roundscale_round_ps(A, B, C, D, R) \ + ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(C), \ + (int)(D), \ + (__v16sf)(__m512)(A), \ + (__mmask16)(B), R)) +#define _mm512_maskz_roundscale_round_ps(A, B, C, R) \ + ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(B), \ + (int)(C), \ + (__v16sf)_mm512_setzero_ps(),\ + (__mmask16)(A), R)) +#define _mm512_roundscale_round_pd(A, B, R) \ + ((__m512d) __builtin_ia32_rndscalepd_mask ((__v8df)(__m512d)(A), (int)(B),\ + (__v8df)_mm512_undefined_pd(), (__mmask8)(-1), R)) +#define _mm512_mask_roundscale_round_pd(A, B, C, D, R) \ + ((__m512d) __builtin_ia32_rndscalepd_mask ((__v8df)(__m512d)(C), \ + (int)(D), \ + (__v8df)(__m512d)(A), \ + (__mmask8)(B), R)) +#define _mm512_maskz_roundscale_round_pd(A, B, C, R) \ + ((__m512d) __builtin_ia32_rndscalepd_mask ((__v8df)(__m512d)(B), \ + (int)(C), \ + (__v8df)_mm512_setzero_pd(),\ + (__mmask8)(A), R)) +#define _mm_roundscale_round_ss(A, B, C, R) \ + ((__m128) __builtin_ia32_rndscaless_round ((__v4sf)(__m128)(A), \ + (__v4sf)(__m128)(B), (int)(C), R)) +#define _mm_roundscale_round_sd(A, B, C, R) \ + ((__m128d) __builtin_ia32_rndscalesd_round ((__v2df)(__m128d)(A), \ + (__v2df)(__m128d)(B), (int)(C), R)) +#endif + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_floor_ps (__m512 __A) +{ + return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A, + _MM_FROUND_FLOOR, + (__v16sf) __A, -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_floor_pd (__m512d __A) +{ + return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A, + _MM_FROUND_FLOOR, + (__v8df) __A, -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_ceil_ps (__m512 __A) +{ + return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A, + _MM_FROUND_CEIL, + (__v16sf) __A, -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_ceil_pd (__m512d __A) +{ + return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A, + _MM_FROUND_CEIL, + (__v8df) __A, -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_floor_ps (__m512 __W, __mmask16 __U, __m512 __A) +{ + return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A, + _MM_FROUND_FLOOR, + (__v16sf) __W, __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_floor_pd (__m512d __W, __mmask8 __U, __m512d __A) +{ + return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A, + _MM_FROUND_FLOOR, + (__v8df) __W, __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_ceil_ps (__m512 __W, __mmask16 __U, __m512 __A) +{ + return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A, + _MM_FROUND_CEIL, + (__v16sf) __W, __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_ceil_pd (__m512d __W, __mmask8 __U, __m512d __A) +{ + return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A, + _MM_FROUND_CEIL, + (__v8df) __W, __U, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_alignr_epi32 (__m512i __A, __m512i __B, const int __imm) +{ + return (__m512i) __builtin_ia32_alignd512_mask ((__v16si) __A, + (__v16si) __B, __imm, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_alignr_epi32 (__m512i __W, __mmask16 __U, __m512i __A, + __m512i __B, const int __imm) +{ + return (__m512i) __builtin_ia32_alignd512_mask ((__v16si) __A, + (__v16si) __B, __imm, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_alignr_epi32 (__mmask16 __U, __m512i __A, __m512i __B, + const int __imm) +{ + return (__m512i) __builtin_ia32_alignd512_mask ((__v16si) __A, + (__v16si) __B, __imm, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_alignr_epi64 (__m512i __A, __m512i __B, const int __imm) +{ + return (__m512i) __builtin_ia32_alignq512_mask ((__v8di) __A, + (__v8di) __B, __imm, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_alignr_epi64 (__m512i __W, __mmask8 __U, __m512i __A, + __m512i __B, const int __imm) +{ + return (__m512i) __builtin_ia32_alignq512_mask ((__v8di) __A, + (__v8di) __B, __imm, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_alignr_epi64 (__mmask8 __U, __m512i __A, __m512i __B, + const int __imm) +{ + return (__m512i) __builtin_ia32_alignq512_mask ((__v8di) __A, + (__v8di) __B, __imm, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} +#else +#define _mm512_alignr_epi32(X, Y, C) \ + ((__m512i)__builtin_ia32_alignd512_mask ((__v16si)(__m512i)(X), \ + (__v16si)(__m512i)(Y), (int)(C), (__v16si)_mm512_undefined_si512 (),\ + (__mmask16)-1)) + +#define _mm512_mask_alignr_epi32(W, U, X, Y, C) \ + ((__m512i)__builtin_ia32_alignd512_mask ((__v16si)(__m512i)(X), \ + (__v16si)(__m512i)(Y), (int)(C), (__v16si)(__m512i)(W), \ + (__mmask16)(U))) + +#define _mm512_maskz_alignr_epi32(U, X, Y, C) \ + ((__m512i)__builtin_ia32_alignd512_mask ((__v16si)(__m512i)(X), \ + (__v16si)(__m512i)(Y), (int)(C), (__v16si)_mm512_setzero_si512 (),\ + (__mmask16)(U))) + +#define _mm512_alignr_epi64(X, Y, C) \ + ((__m512i)__builtin_ia32_alignq512_mask ((__v8di)(__m512i)(X), \ + (__v8di)(__m512i)(Y), (int)(C), (__v8di)_mm512_undefined_si512 (), \ + (__mmask8)-1)) + +#define _mm512_mask_alignr_epi64(W, U, X, Y, C) \ + ((__m512i)__builtin_ia32_alignq512_mask ((__v8di)(__m512i)(X), \ + (__v8di)(__m512i)(Y), (int)(C), (__v8di)(__m512i)(W), (__mmask8)(U))) + +#define _mm512_maskz_alignr_epi64(U, X, Y, C) \ + ((__m512i)__builtin_ia32_alignq512_mask ((__v8di)(__m512i)(X), \ + (__v8di)(__m512i)(Y), (int)(C), (__v8di)_mm512_setzero_si512 (),\ + (__mmask8)(U))) +#endif + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmpeq_epi32_mask (__m512i __A, __m512i __B) +{ + return (__mmask16) __builtin_ia32_pcmpeqd512_mask ((__v16si) __A, + (__v16si) __B, + (__mmask16) -1); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cmpeq_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B) +{ + return (__mmask16) __builtin_ia32_pcmpeqd512_mask ((__v16si) __A, + (__v16si) __B, __U); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cmpeq_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B) +{ + return (__mmask8) __builtin_ia32_pcmpeqq512_mask ((__v8di) __A, + (__v8di) __B, __U); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmpeq_epi64_mask (__m512i __A, __m512i __B) +{ + return (__mmask8) __builtin_ia32_pcmpeqq512_mask ((__v8di) __A, + (__v8di) __B, + (__mmask8) -1); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmpgt_epi32_mask (__m512i __A, __m512i __B) +{ + return (__mmask16) __builtin_ia32_pcmpgtd512_mask ((__v16si) __A, + (__v16si) __B, + (__mmask16) -1); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cmpgt_epi32_mask (__mmask16 __U, __m512i __A, __m512i __B) +{ + return (__mmask16) __builtin_ia32_pcmpgtd512_mask ((__v16si) __A, + (__v16si) __B, __U); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cmpgt_epi64_mask (__mmask8 __U, __m512i __A, __m512i __B) +{ + return (__mmask8) __builtin_ia32_pcmpgtq512_mask ((__v8di) __A, + (__v8di) __B, __U); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmpgt_epi64_mask (__m512i __A, __m512i __B) +{ + return (__mmask8) __builtin_ia32_pcmpgtq512_mask ((__v8di) __A, + (__v8di) __B, + (__mmask8) -1); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmpge_epi32_mask (__m512i __X, __m512i __Y) +{ + return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X, + (__v16si) __Y, 5, + (__mmask16) -1); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmpge_epu32_mask (__m512i __X, __m512i __Y) +{ + return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X, + (__v16si) __Y, 5, + (__mmask16) -1); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmpge_epi64_mask (__m512i __X, __m512i __Y) +{ + return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X, + (__v8di) __Y, 5, + (__mmask8) -1); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmpge_epu64_mask (__m512i __X, __m512i __Y) +{ + return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X, + (__v8di) __Y, 5, + (__mmask8) -1); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmple_epi32_mask (__m512i __X, __m512i __Y) +{ + return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X, + (__v16si) __Y, 2, + (__mmask16) -1); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmple_epu32_mask (__m512i __X, __m512i __Y) +{ + return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X, + (__v16si) __Y, 2, + (__mmask16) -1); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmple_epi64_mask (__m512i __X, __m512i __Y) +{ + return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X, + (__v8di) __Y, 2, + (__mmask8) -1); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmple_epu64_mask (__m512i __X, __m512i __Y) +{ + return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X, + (__v8di) __Y, 2, + (__mmask8) -1); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmplt_epi32_mask (__m512i __X, __m512i __Y) +{ + return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X, + (__v16si) __Y, 1, + (__mmask16) -1); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmplt_epu32_mask (__m512i __X, __m512i __Y) +{ + return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X, + (__v16si) __Y, 1, + (__mmask16) -1); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmplt_epi64_mask (__m512i __X, __m512i __Y) +{ + return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X, + (__v8di) __Y, 1, + (__mmask8) -1); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmplt_epu64_mask (__m512i __X, __m512i __Y) +{ + return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X, + (__v8di) __Y, 1, + (__mmask8) -1); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmpneq_epi32_mask (__m512i __X, __m512i __Y) +{ + return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X, + (__v16si) __Y, 4, + (__mmask16) -1); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmpneq_epu32_mask (__m512i __X, __m512i __Y) +{ + return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X, + (__v16si) __Y, 4, + (__mmask16) -1); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmpneq_epi64_mask (__m512i __X, __m512i __Y) +{ + return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X, + (__v8di) __Y, 4, + (__mmask8) -1); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmpneq_epu64_mask (__m512i __X, __m512i __Y) +{ + return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X, + (__v8di) __Y, 4, + (__mmask8) -1); +} + +#define _MM_CMPINT_EQ 0x0 +#define _MM_CMPINT_LT 0x1 +#define _MM_CMPINT_LE 0x2 +#define _MM_CMPINT_UNUSED 0x3 +#define _MM_CMPINT_NE 0x4 +#define _MM_CMPINT_NLT 0x5 +#define _MM_CMPINT_GE 0x5 +#define _MM_CMPINT_NLE 0x6 +#define _MM_CMPINT_GT 0x6 + +#ifdef __OPTIMIZE__ +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmp_epi64_mask (__m512i __X, __m512i __Y, const int __P) +{ + return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X, + (__v8di) __Y, __P, + (__mmask8) -1); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmp_epi32_mask (__m512i __X, __m512i __Y, const int __P) +{ + return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X, + (__v16si) __Y, __P, + (__mmask16) -1); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmp_epu64_mask (__m512i __X, __m512i __Y, const int __P) +{ + return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X, + (__v8di) __Y, __P, + (__mmask8) -1); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmp_epu32_mask (__m512i __X, __m512i __Y, const int __P) +{ + return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X, + (__v16si) __Y, __P, + (__mmask16) -1); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmp_round_pd_mask (__m512d __X, __m512d __Y, const int __P, + const int __R) +{ + return (__mmask8) __builtin_ia32_cmppd512_mask ((__v8df) __X, + (__v8df) __Y, __P, + (__mmask8) -1, __R); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmp_round_ps_mask (__m512 __X, __m512 __Y, const int __P, const int __R) +{ + return (__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf) __X, + (__v16sf) __Y, __P, + (__mmask16) -1, __R); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cmp_epi64_mask (__mmask8 __U, __m512i __X, __m512i __Y, + const int __P) +{ + return (__mmask8) __builtin_ia32_cmpq512_mask ((__v8di) __X, + (__v8di) __Y, __P, + (__mmask8) __U); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cmp_epi32_mask (__mmask16 __U, __m512i __X, __m512i __Y, + const int __P) +{ + return (__mmask16) __builtin_ia32_cmpd512_mask ((__v16si) __X, + (__v16si) __Y, __P, + (__mmask16) __U); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cmp_epu64_mask (__mmask8 __U, __m512i __X, __m512i __Y, + const int __P) +{ + return (__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di) __X, + (__v8di) __Y, __P, + (__mmask8) __U); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cmp_epu32_mask (__mmask16 __U, __m512i __X, __m512i __Y, + const int __P) +{ + return (__mmask16) __builtin_ia32_ucmpd512_mask ((__v16si) __X, + (__v16si) __Y, __P, + (__mmask16) __U); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cmp_round_pd_mask (__mmask8 __U, __m512d __X, __m512d __Y, + const int __P, const int __R) +{ + return (__mmask8) __builtin_ia32_cmppd512_mask ((__v8df) __X, + (__v8df) __Y, __P, + (__mmask8) __U, __R); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cmp_round_ps_mask (__mmask16 __U, __m512 __X, __m512 __Y, + const int __P, const int __R) +{ + return (__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf) __X, + (__v16sf) __Y, __P, + (__mmask16) __U, __R); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmp_round_sd_mask (__m128d __X, __m128d __Y, const int __P, const int __R) +{ + return (__mmask8) __builtin_ia32_cmpsd_mask ((__v2df) __X, + (__v2df) __Y, __P, + (__mmask8) -1, __R); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cmp_round_sd_mask (__mmask8 __M, __m128d __X, __m128d __Y, + const int __P, const int __R) +{ + return (__mmask8) __builtin_ia32_cmpsd_mask ((__v2df) __X, + (__v2df) __Y, __P, + (__mmask8) __M, __R); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmp_round_ss_mask (__m128 __X, __m128 __Y, const int __P, const int __R) +{ + return (__mmask8) __builtin_ia32_cmpss_mask ((__v4sf) __X, + (__v4sf) __Y, __P, + (__mmask8) -1, __R); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cmp_round_ss_mask (__mmask8 __M, __m128 __X, __m128 __Y, + const int __P, const int __R) +{ + return (__mmask8) __builtin_ia32_cmpss_mask ((__v4sf) __X, + (__v4sf) __Y, __P, + (__mmask8) __M, __R); +} + +#else +#define _mm512_cmp_epi64_mask(X, Y, P) \ + ((__mmask8) __builtin_ia32_cmpq512_mask ((__v8di)(__m512i)(X), \ + (__v8di)(__m512i)(Y), (int)(P),\ + (__mmask8)-1)) + +#define _mm512_cmp_epi32_mask(X, Y, P) \ + ((__mmask8) __builtin_ia32_cmpd512_mask ((__v16si)(__m512i)(X), \ + (__v16si)(__m512i)(Y), (int)(P),\ + (__mmask16)-1)) + +#define _mm512_cmp_epu64_mask(X, Y, P) \ + ((__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di)(__m512i)(X), \ + (__v8di)(__m512i)(Y), (int)(P),\ + (__mmask8)-1)) + +#define _mm512_cmp_epu32_mask(X, Y, P) \ + ((__mmask8) __builtin_ia32_ucmpd512_mask ((__v16si)(__m512i)(X), \ + (__v16si)(__m512i)(Y), (int)(P),\ + (__mmask16)-1)) + +#define _mm512_cmp_round_pd_mask(X, Y, P, R) \ + ((__mmask8) __builtin_ia32_cmppd512_mask ((__v8df)(__m512d)(X), \ + (__v8df)(__m512d)(Y), (int)(P),\ + (__mmask8)-1, R)) + +#define _mm512_cmp_round_ps_mask(X, Y, P, R) \ + ((__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf)(__m512)(X), \ + (__v16sf)(__m512)(Y), (int)(P),\ + (__mmask16)-1, R)) + +#define _mm512_mask_cmp_epi64_mask(M, X, Y, P) \ + ((__mmask8) __builtin_ia32_cmpq512_mask ((__v8di)(__m512i)(X), \ + (__v8di)(__m512i)(Y), (int)(P),\ + (__mmask8)M)) + +#define _mm512_mask_cmp_epi32_mask(M, X, Y, P) \ + ((__mmask8) __builtin_ia32_cmpd512_mask ((__v16si)(__m512i)(X), \ + (__v16si)(__m512i)(Y), (int)(P),\ + (__mmask16)M)) + +#define _mm512_mask_cmp_epu64_mask(M, X, Y, P) \ + ((__mmask8) __builtin_ia32_ucmpq512_mask ((__v8di)(__m512i)(X), \ + (__v8di)(__m512i)(Y), (int)(P),\ + (__mmask8)M)) + +#define _mm512_mask_cmp_epu32_mask(M, X, Y, P) \ + ((__mmask8) __builtin_ia32_ucmpd512_mask ((__v16si)(__m512i)(X), \ + (__v16si)(__m512i)(Y), (int)(P),\ + (__mmask16)M)) + +#define _mm512_mask_cmp_round_pd_mask(M, X, Y, P, R) \ + ((__mmask8) __builtin_ia32_cmppd512_mask ((__v8df)(__m512d)(X), \ + (__v8df)(__m512d)(Y), (int)(P),\ + (__mmask8)M, R)) + +#define _mm512_mask_cmp_round_ps_mask(M, X, Y, P, R) \ + ((__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf)(__m512)(X), \ + (__v16sf)(__m512)(Y), (int)(P),\ + (__mmask16)M, R)) + +#define _mm_cmp_round_sd_mask(X, Y, P, R) \ + ((__mmask8) __builtin_ia32_cmpsd_mask ((__v2df)(__m128d)(X), \ + (__v2df)(__m128d)(Y), (int)(P),\ + (__mmask8)-1, R)) + +#define _mm_mask_cmp_round_sd_mask(M, X, Y, P, R) \ + ((__mmask8) __builtin_ia32_cmpsd_mask ((__v2df)(__m128d)(X), \ + (__v2df)(__m128d)(Y), (int)(P),\ + (M), R)) + +#define _mm_cmp_round_ss_mask(X, Y, P, R) \ + ((__mmask8) __builtin_ia32_cmpss_mask ((__v4sf)(__m128)(X), \ + (__v4sf)(__m128)(Y), (int)(P), \ + (__mmask8)-1, R)) + +#define _mm_mask_cmp_round_ss_mask(M, X, Y, P, R) \ + ((__mmask8) __builtin_ia32_cmpss_mask ((__v4sf)(__m128)(X), \ + (__v4sf)(__m128)(Y), (int)(P), \ + (M), R)) +#endif + +#ifdef __OPTIMIZE__ +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_i32gather_ps (__m512i __index, float const *__addr, int __scale) +{ + __m512 v1_old = _mm512_undefined_ps (); + __mmask16 mask = 0xFFFF; + + return (__m512) __builtin_ia32_gathersiv16sf ((__v16sf) v1_old, + __addr, + (__v16si) __index, + mask, __scale); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_i32gather_ps (__m512 v1_old, __mmask16 __mask, + __m512i __index, float const *__addr, int __scale) +{ + return (__m512) __builtin_ia32_gathersiv16sf ((__v16sf) v1_old, + __addr, + (__v16si) __index, + __mask, __scale); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_i32gather_pd (__m256i __index, double const *__addr, int __scale) +{ + __m512d v1_old = _mm512_undefined_pd (); + __mmask8 mask = 0xFF; + + return (__m512d) __builtin_ia32_gathersiv8df ((__v8df) v1_old, + __addr, + (__v8si) __index, mask, + __scale); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_i32gather_pd (__m512d __v1_old, __mmask8 __mask, + __m256i __index, double const *__addr, int __scale) +{ + return (__m512d) __builtin_ia32_gathersiv8df ((__v8df) __v1_old, + __addr, + (__v8si) __index, + __mask, __scale); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_i64gather_ps (__m512i __index, float const *__addr, int __scale) +{ + __m256 v1_old = _mm256_undefined_ps (); + __mmask8 mask = 0xFF; + + return (__m256) __builtin_ia32_gatherdiv16sf ((__v8sf) v1_old, + __addr, + (__v8di) __index, mask, + __scale); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_i64gather_ps (__m256 __v1_old, __mmask8 __mask, + __m512i __index, float const *__addr, int __scale) +{ + return (__m256) __builtin_ia32_gatherdiv16sf ((__v8sf) __v1_old, + __addr, + (__v8di) __index, + __mask, __scale); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_i64gather_pd (__m512i __index, double const *__addr, int __scale) +{ + __m512d v1_old = _mm512_undefined_pd (); + __mmask8 mask = 0xFF; + + return (__m512d) __builtin_ia32_gatherdiv8df ((__v8df) v1_old, + __addr, + (__v8di) __index, mask, + __scale); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_i64gather_pd (__m512d __v1_old, __mmask8 __mask, + __m512i __index, double const *__addr, int __scale) +{ + return (__m512d) __builtin_ia32_gatherdiv8df ((__v8df) __v1_old, + __addr, + (__v8di) __index, + __mask, __scale); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_i32gather_epi32 (__m512i __index, int const *__addr, int __scale) +{ + __m512i v1_old = _mm512_undefined_si512 (); + __mmask16 mask = 0xFFFF; + + return (__m512i) __builtin_ia32_gathersiv16si ((__v16si) v1_old, + __addr, + (__v16si) __index, + mask, __scale); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_i32gather_epi32 (__m512i __v1_old, __mmask16 __mask, + __m512i __index, int const *__addr, int __scale) +{ + return (__m512i) __builtin_ia32_gathersiv16si ((__v16si) __v1_old, + __addr, + (__v16si) __index, + __mask, __scale); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_i32gather_epi64 (__m256i __index, long long const *__addr, int __scale) +{ + __m512i v1_old = _mm512_undefined_si512 (); + __mmask8 mask = 0xFF; + + return (__m512i) __builtin_ia32_gathersiv8di ((__v8di) v1_old, + __addr, + (__v8si) __index, mask, + __scale); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_i32gather_epi64 (__m512i __v1_old, __mmask8 __mask, + __m256i __index, long long const *__addr, + int __scale) +{ + return (__m512i) __builtin_ia32_gathersiv8di ((__v8di) __v1_old, + __addr, + (__v8si) __index, + __mask, __scale); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_i64gather_epi32 (__m512i __index, int const *__addr, int __scale) +{ + __m256i v1_old = _mm256_undefined_si256 (); + __mmask8 mask = 0xFF; + + return (__m256i) __builtin_ia32_gatherdiv16si ((__v8si) v1_old, + __addr, + (__v8di) __index, + mask, __scale); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_i64gather_epi32 (__m256i __v1_old, __mmask8 __mask, + __m512i __index, int const *__addr, int __scale) +{ + return (__m256i) __builtin_ia32_gatherdiv16si ((__v8si) __v1_old, + __addr, + (__v8di) __index, + __mask, __scale); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_i64gather_epi64 (__m512i __index, long long const *__addr, int __scale) +{ + __m512i v1_old = _mm512_undefined_si512 (); + __mmask8 mask = 0xFF; + + return (__m512i) __builtin_ia32_gatherdiv8di ((__v8di) v1_old, + __addr, + (__v8di) __index, mask, + __scale); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_i64gather_epi64 (__m512i __v1_old, __mmask8 __mask, + __m512i __index, long long const *__addr, + int __scale) +{ + return (__m512i) __builtin_ia32_gatherdiv8di ((__v8di) __v1_old, + __addr, + (__v8di) __index, + __mask, __scale); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_i32scatter_ps (float *__addr, __m512i __index, __m512 __v1, int __scale) +{ + __builtin_ia32_scattersiv16sf (__addr, (__mmask16) 0xFFFF, + (__v16si) __index, (__v16sf) __v1, __scale); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_i32scatter_ps (float *__addr, __mmask16 __mask, + __m512i __index, __m512 __v1, int __scale) +{ + __builtin_ia32_scattersiv16sf (__addr, __mask, (__v16si) __index, + (__v16sf) __v1, __scale); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_i32scatter_pd (double *__addr, __m256i __index, __m512d __v1, + int __scale) +{ + __builtin_ia32_scattersiv8df (__addr, (__mmask8) 0xFF, + (__v8si) __index, (__v8df) __v1, __scale); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_i32scatter_pd (double *__addr, __mmask8 __mask, + __m256i __index, __m512d __v1, int __scale) +{ + __builtin_ia32_scattersiv8df (__addr, __mask, (__v8si) __index, + (__v8df) __v1, __scale); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_i64scatter_ps (float *__addr, __m512i __index, __m256 __v1, int __scale) +{ + __builtin_ia32_scatterdiv16sf (__addr, (__mmask8) 0xFF, + (__v8di) __index, (__v8sf) __v1, __scale); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_i64scatter_ps (float *__addr, __mmask8 __mask, + __m512i __index, __m256 __v1, int __scale) +{ + __builtin_ia32_scatterdiv16sf (__addr, __mask, (__v8di) __index, + (__v8sf) __v1, __scale); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_i64scatter_pd (double *__addr, __m512i __index, __m512d __v1, + int __scale) +{ + __builtin_ia32_scatterdiv8df (__addr, (__mmask8) 0xFF, + (__v8di) __index, (__v8df) __v1, __scale); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_i64scatter_pd (double *__addr, __mmask8 __mask, + __m512i __index, __m512d __v1, int __scale) +{ + __builtin_ia32_scatterdiv8df (__addr, __mask, (__v8di) __index, + (__v8df) __v1, __scale); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_i32scatter_epi32 (int *__addr, __m512i __index, + __m512i __v1, int __scale) +{ + __builtin_ia32_scattersiv16si (__addr, (__mmask16) 0xFFFF, + (__v16si) __index, (__v16si) __v1, __scale); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_i32scatter_epi32 (int *__addr, __mmask16 __mask, + __m512i __index, __m512i __v1, int __scale) +{ + __builtin_ia32_scattersiv16si (__addr, __mask, (__v16si) __index, + (__v16si) __v1, __scale); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_i32scatter_epi64 (long long *__addr, __m256i __index, + __m512i __v1, int __scale) +{ + __builtin_ia32_scattersiv8di (__addr, (__mmask8) 0xFF, + (__v8si) __index, (__v8di) __v1, __scale); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_i32scatter_epi64 (long long *__addr, __mmask8 __mask, + __m256i __index, __m512i __v1, int __scale) +{ + __builtin_ia32_scattersiv8di (__addr, __mask, (__v8si) __index, + (__v8di) __v1, __scale); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_i64scatter_epi32 (int *__addr, __m512i __index, + __m256i __v1, int __scale) +{ + __builtin_ia32_scatterdiv16si (__addr, (__mmask8) 0xFF, + (__v8di) __index, (__v8si) __v1, __scale); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_i64scatter_epi32 (int *__addr, __mmask8 __mask, + __m512i __index, __m256i __v1, int __scale) +{ + __builtin_ia32_scatterdiv16si (__addr, __mask, (__v8di) __index, + (__v8si) __v1, __scale); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_i64scatter_epi64 (long long *__addr, __m512i __index, + __m512i __v1, int __scale) +{ + __builtin_ia32_scatterdiv8di (__addr, (__mmask8) 0xFF, + (__v8di) __index, (__v8di) __v1, __scale); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_i64scatter_epi64 (long long *__addr, __mmask8 __mask, + __m512i __index, __m512i __v1, int __scale) +{ + __builtin_ia32_scatterdiv8di (__addr, __mask, (__v8di) __index, + (__v8di) __v1, __scale); +} +#else +#define _mm512_i32gather_ps(INDEX, ADDR, SCALE) \ + (__m512) __builtin_ia32_gathersiv16sf ((__v16sf)_mm512_undefined_ps(),\ + (float const *)ADDR, \ + (__v16si)(__m512i)INDEX, \ + (__mmask16)0xFFFF, (int)SCALE) + +#define _mm512_mask_i32gather_ps(V1OLD, MASK, INDEX, ADDR, SCALE) \ + (__m512) __builtin_ia32_gathersiv16sf ((__v16sf)(__m512)V1OLD, \ + (float const *)ADDR, \ + (__v16si)(__m512i)INDEX, \ + (__mmask16)MASK, (int)SCALE) + +#define _mm512_i32gather_pd(INDEX, ADDR, SCALE) \ + (__m512d) __builtin_ia32_gathersiv8df ((__v8df)_mm512_undefined_pd(), \ + (double const *)ADDR, \ + (__v8si)(__m256i)INDEX, \ + (__mmask8)0xFF, (int)SCALE) + +#define _mm512_mask_i32gather_pd(V1OLD, MASK, INDEX, ADDR, SCALE) \ + (__m512d) __builtin_ia32_gathersiv8df ((__v8df)(__m512d)V1OLD, \ + (double const *)ADDR, \ + (__v8si)(__m256i)INDEX, \ + (__mmask8)MASK, (int)SCALE) + +#define _mm512_i64gather_ps(INDEX, ADDR, SCALE) \ + (__m256) __builtin_ia32_gatherdiv16sf ((__v8sf)_mm256_undefined_ps(), \ + (float const *)ADDR, \ + (__v8di)(__m512i)INDEX, \ + (__mmask8)0xFF, (int)SCALE) + +#define _mm512_mask_i64gather_ps(V1OLD, MASK, INDEX, ADDR, SCALE) \ + (__m256) __builtin_ia32_gatherdiv16sf ((__v8sf)(__m256)V1OLD, \ + (float const *)ADDR, \ + (__v8di)(__m512i)INDEX, \ + (__mmask8)MASK, (int)SCALE) + +#define _mm512_i64gather_pd(INDEX, ADDR, SCALE) \ + (__m512d) __builtin_ia32_gatherdiv8df ((__v8df)_mm512_undefined_pd(), \ + (double const *)ADDR, \ + (__v8di)(__m512i)INDEX, \ + (__mmask8)0xFF, (int)SCALE) + +#define _mm512_mask_i64gather_pd(V1OLD, MASK, INDEX, ADDR, SCALE) \ + (__m512d) __builtin_ia32_gatherdiv8df ((__v8df)(__m512d)V1OLD, \ + (double const *)ADDR, \ + (__v8di)(__m512i)INDEX, \ + (__mmask8)MASK, (int)SCALE) + +#define _mm512_i32gather_epi32(INDEX, ADDR, SCALE) \ + (__m512i) __builtin_ia32_gathersiv16si ((__v16si)_mm512_undefined_si512 (), \ + (int const *)ADDR, \ + (__v16si)(__m512i)INDEX, \ + (__mmask16)0xFFFF, (int)SCALE) + +#define _mm512_mask_i32gather_epi32(V1OLD, MASK, INDEX, ADDR, SCALE) \ + (__m512i) __builtin_ia32_gathersiv16si ((__v16si)(__m512i)V1OLD, \ + (int const *)ADDR, \ + (__v16si)(__m512i)INDEX, \ + (__mmask16)MASK, (int)SCALE) + +#define _mm512_i32gather_epi64(INDEX, ADDR, SCALE) \ + (__m512i) __builtin_ia32_gathersiv8di ((__v8di)_mm512_undefined_si512 (), \ + (long long const *)ADDR, \ + (__v8si)(__m256i)INDEX, \ + (__mmask8)0xFF, (int)SCALE) + +#define _mm512_mask_i32gather_epi64(V1OLD, MASK, INDEX, ADDR, SCALE) \ + (__m512i) __builtin_ia32_gathersiv8di ((__v8di)(__m512i)V1OLD, \ + (long long const *)ADDR, \ + (__v8si)(__m256i)INDEX, \ + (__mmask8)MASK, (int)SCALE) + +#define _mm512_i64gather_epi32(INDEX, ADDR, SCALE) \ + (__m256i) __builtin_ia32_gatherdiv16si ((__v8si)_mm256_undefined_si256(), \ + (int const *)ADDR, \ + (__v8di)(__m512i)INDEX, \ + (__mmask8)0xFF, (int)SCALE) + +#define _mm512_mask_i64gather_epi32(V1OLD, MASK, INDEX, ADDR, SCALE) \ + (__m256i) __builtin_ia32_gatherdiv16si ((__v8si)(__m256i)V1OLD, \ + (int const *)ADDR, \ + (__v8di)(__m512i)INDEX, \ + (__mmask8)MASK, (int)SCALE) + +#define _mm512_i64gather_epi64(INDEX, ADDR, SCALE) \ + (__m512i) __builtin_ia32_gatherdiv8di ((__v8di)_mm512_undefined_si512 (), \ + (long long const *)ADDR, \ + (__v8di)(__m512i)INDEX, \ + (__mmask8)0xFF, (int)SCALE) + +#define _mm512_mask_i64gather_epi64(V1OLD, MASK, INDEX, ADDR, SCALE) \ + (__m512i) __builtin_ia32_gatherdiv8di ((__v8di)(__m512i)V1OLD, \ + (long long const *)ADDR, \ + (__v8di)(__m512i)INDEX, \ + (__mmask8)MASK, (int)SCALE) + +#define _mm512_i32scatter_ps(ADDR, INDEX, V1, SCALE) \ + __builtin_ia32_scattersiv16sf ((float *)ADDR, (__mmask16)0xFFFF, \ + (__v16si)(__m512i)INDEX, \ + (__v16sf)(__m512)V1, (int)SCALE) + +#define _mm512_mask_i32scatter_ps(ADDR, MASK, INDEX, V1, SCALE) \ + __builtin_ia32_scattersiv16sf ((float *)ADDR, (__mmask16)MASK, \ + (__v16si)(__m512i)INDEX, \ + (__v16sf)(__m512)V1, (int)SCALE) + +#define _mm512_i32scatter_pd(ADDR, INDEX, V1, SCALE) \ + __builtin_ia32_scattersiv8df ((double *)ADDR, (__mmask8)0xFF, \ + (__v8si)(__m256i)INDEX, \ + (__v8df)(__m512d)V1, (int)SCALE) + +#define _mm512_mask_i32scatter_pd(ADDR, MASK, INDEX, V1, SCALE) \ + __builtin_ia32_scattersiv8df ((double *)ADDR, (__mmask8)MASK, \ + (__v8si)(__m256i)INDEX, \ + (__v8df)(__m512d)V1, (int)SCALE) + +#define _mm512_i64scatter_ps(ADDR, INDEX, V1, SCALE) \ + __builtin_ia32_scatterdiv16sf ((float *)ADDR, (__mmask8)0xFF, \ + (__v8di)(__m512i)INDEX, \ + (__v8sf)(__m256)V1, (int)SCALE) + +#define _mm512_mask_i64scatter_ps(ADDR, MASK, INDEX, V1, SCALE) \ + __builtin_ia32_scatterdiv16sf ((float *)ADDR, (__mmask16)MASK, \ + (__v8di)(__m512i)INDEX, \ + (__v8sf)(__m256)V1, (int)SCALE) + +#define _mm512_i64scatter_pd(ADDR, INDEX, V1, SCALE) \ + __builtin_ia32_scatterdiv8df ((double *)ADDR, (__mmask8)0xFF, \ + (__v8di)(__m512i)INDEX, \ + (__v8df)(__m512d)V1, (int)SCALE) + +#define _mm512_mask_i64scatter_pd(ADDR, MASK, INDEX, V1, SCALE) \ + __builtin_ia32_scatterdiv8df ((double *)ADDR, (__mmask8)MASK, \ + (__v8di)(__m512i)INDEX, \ + (__v8df)(__m512d)V1, (int)SCALE) + +#define _mm512_i32scatter_epi32(ADDR, INDEX, V1, SCALE) \ + __builtin_ia32_scattersiv16si ((int *)ADDR, (__mmask16)0xFFFF, \ + (__v16si)(__m512i)INDEX, \ + (__v16si)(__m512i)V1, (int)SCALE) + +#define _mm512_mask_i32scatter_epi32(ADDR, MASK, INDEX, V1, SCALE) \ + __builtin_ia32_scattersiv16si ((int *)ADDR, (__mmask16)MASK, \ + (__v16si)(__m512i)INDEX, \ + (__v16si)(__m512i)V1, (int)SCALE) + +#define _mm512_i32scatter_epi64(ADDR, INDEX, V1, SCALE) \ + __builtin_ia32_scattersiv8di ((long long *)ADDR, (__mmask8)0xFF, \ + (__v8si)(__m256i)INDEX, \ + (__v8di)(__m512i)V1, (int)SCALE) + +#define _mm512_mask_i32scatter_epi64(ADDR, MASK, INDEX, V1, SCALE) \ + __builtin_ia32_scattersiv8di ((long long *)ADDR, (__mmask8)MASK, \ + (__v8si)(__m256i)INDEX, \ + (__v8di)(__m512i)V1, (int)SCALE) + +#define _mm512_i64scatter_epi32(ADDR, INDEX, V1, SCALE) \ + __builtin_ia32_scatterdiv16si ((int *)ADDR, (__mmask8)0xFF, \ + (__v8di)(__m512i)INDEX, \ + (__v8si)(__m256i)V1, (int)SCALE) + +#define _mm512_mask_i64scatter_epi32(ADDR, MASK, INDEX, V1, SCALE) \ + __builtin_ia32_scatterdiv16si ((int *)ADDR, (__mmask8)MASK, \ + (__v8di)(__m512i)INDEX, \ + (__v8si)(__m256i)V1, (int)SCALE) + +#define _mm512_i64scatter_epi64(ADDR, INDEX, V1, SCALE) \ + __builtin_ia32_scatterdiv8di ((long long *)ADDR, (__mmask8)0xFF, \ + (__v8di)(__m512i)INDEX, \ + (__v8di)(__m512i)V1, (int)SCALE) + +#define _mm512_mask_i64scatter_epi64(ADDR, MASK, INDEX, V1, SCALE) \ + __builtin_ia32_scatterdiv8di ((long long *)ADDR, (__mmask8)MASK, \ + (__v8di)(__m512i)INDEX, \ + (__v8di)(__m512i)V1, (int)SCALE) +#endif + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_compress_pd (__m512d __W, __mmask8 __U, __m512d __A) +{ + return (__m512d) __builtin_ia32_compressdf512_mask ((__v8df) __A, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_compress_pd (__mmask8 __U, __m512d __A) +{ + return (__m512d) __builtin_ia32_compressdf512_mask ((__v8df) __A, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_compressstoreu_pd (void *__P, __mmask8 __U, __m512d __A) +{ + __builtin_ia32_compressstoredf512_mask ((__v8df *) __P, (__v8df) __A, + (__mmask8) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_compress_ps (__m512 __W, __mmask16 __U, __m512 __A) +{ + return (__m512) __builtin_ia32_compresssf512_mask ((__v16sf) __A, + (__v16sf) __W, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_compress_ps (__mmask16 __U, __m512 __A) +{ + return (__m512) __builtin_ia32_compresssf512_mask ((__v16sf) __A, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_compressstoreu_ps (void *__P, __mmask16 __U, __m512 __A) +{ + __builtin_ia32_compressstoresf512_mask ((__v16sf *) __P, (__v16sf) __A, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_compress_epi64 (__m512i __W, __mmask8 __U, __m512i __A) +{ + return (__m512i) __builtin_ia32_compressdi512_mask ((__v8di) __A, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_compress_epi64 (__mmask8 __U, __m512i __A) +{ + return (__m512i) __builtin_ia32_compressdi512_mask ((__v8di) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_compressstoreu_epi64 (void *__P, __mmask8 __U, __m512i __A) +{ + __builtin_ia32_compressstoredi512_mask ((__v8di *) __P, (__v8di) __A, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_compress_epi32 (__m512i __W, __mmask16 __U, __m512i __A) +{ + return (__m512i) __builtin_ia32_compresssi512_mask ((__v16si) __A, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_compress_epi32 (__mmask16 __U, __m512i __A) +{ + return (__m512i) __builtin_ia32_compresssi512_mask ((__v16si) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_compressstoreu_epi32 (void *__P, __mmask16 __U, __m512i __A) +{ + __builtin_ia32_compressstoresi512_mask ((__v16si *) __P, (__v16si) __A, + (__mmask16) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_expand_pd (__m512d __W, __mmask8 __U, __m512d __A) +{ + return (__m512d) __builtin_ia32_expanddf512_mask ((__v8df) __A, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_expand_pd (__mmask8 __U, __m512d __A) +{ + return (__m512d) __builtin_ia32_expanddf512_maskz ((__v8df) __A, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_expandloadu_pd (__m512d __W, __mmask8 __U, void const *__P) +{ + return (__m512d) __builtin_ia32_expandloaddf512_mask ((const __v8df *) __P, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_expandloadu_pd (__mmask8 __U, void const *__P) +{ + return (__m512d) __builtin_ia32_expandloaddf512_maskz ((const __v8df *) __P, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_expand_ps (__m512 __W, __mmask16 __U, __m512 __A) +{ + return (__m512) __builtin_ia32_expandsf512_mask ((__v16sf) __A, + (__v16sf) __W, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_expand_ps (__mmask16 __U, __m512 __A) +{ + return (__m512) __builtin_ia32_expandsf512_maskz ((__v16sf) __A, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_expandloadu_ps (__m512 __W, __mmask16 __U, void const *__P) +{ + return (__m512) __builtin_ia32_expandloadsf512_mask ((const __v16sf *) __P, + (__v16sf) __W, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_expandloadu_ps (__mmask16 __U, void const *__P) +{ + return (__m512) __builtin_ia32_expandloadsf512_maskz ((const __v16sf *) __P, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_expand_epi64 (__m512i __W, __mmask8 __U, __m512i __A) +{ + return (__m512i) __builtin_ia32_expanddi512_mask ((__v8di) __A, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_expand_epi64 (__mmask8 __U, __m512i __A) +{ + return (__m512i) __builtin_ia32_expanddi512_maskz ((__v8di) __A, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_expandloadu_epi64 (__m512i __W, __mmask8 __U, void const *__P) +{ + return (__m512i) __builtin_ia32_expandloaddi512_mask ((const __v8di *) __P, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_expandloadu_epi64 (__mmask8 __U, void const *__P) +{ + return (__m512i) + __builtin_ia32_expandloaddi512_maskz ((const __v8di *) __P, + (__v8di) + _mm512_setzero_si512 (), + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_expand_epi32 (__m512i __W, __mmask16 __U, __m512i __A) +{ + return (__m512i) __builtin_ia32_expandsi512_mask ((__v16si) __A, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_expand_epi32 (__mmask16 __U, __m512i __A) +{ + return (__m512i) __builtin_ia32_expandsi512_maskz ((__v16si) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_expandloadu_epi32 (__m512i __W, __mmask16 __U, void const *__P) +{ + return (__m512i) __builtin_ia32_expandloadsi512_mask ((const __v16si *) __P, + (__v16si) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_expandloadu_epi32 (__mmask16 __U, void const *__P) +{ + return (__m512i) __builtin_ia32_expandloadsi512_maskz ((const __v16si *) __P, + (__v16si) + _mm512_setzero_si512 + (), (__mmask16) __U); +} + +/* Mask arithmetic operations */ +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_kand (__mmask16 __A, __mmask16 __B) +{ + return (__mmask16) __builtin_ia32_kandhi ((__mmask16) __A, (__mmask16) __B); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_kandn (__mmask16 __A, __mmask16 __B) +{ + return (__mmask16) __builtin_ia32_kandnhi ((__mmask16) __A, (__mmask16) __B); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_kor (__mmask16 __A, __mmask16 __B) +{ + return (__mmask16) __builtin_ia32_korhi ((__mmask16) __A, (__mmask16) __B); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_kortestz (__mmask16 __A, __mmask16 __B) +{ + return (__mmask16) __builtin_ia32_kortestzhi ((__mmask16) __A, + (__mmask16) __B); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_kortestc (__mmask16 __A, __mmask16 __B) +{ + return (__mmask16) __builtin_ia32_kortestchi ((__mmask16) __A, + (__mmask16) __B); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_kxnor (__mmask16 __A, __mmask16 __B) +{ + return (__mmask16) __builtin_ia32_kxnorhi ((__mmask16) __A, (__mmask16) __B); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_kxor (__mmask16 __A, __mmask16 __B) +{ + return (__mmask16) __builtin_ia32_kxorhi ((__mmask16) __A, (__mmask16) __B); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_knot (__mmask16 __A) +{ + return (__mmask16) __builtin_ia32_knothi ((__mmask16) __A); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_kunpackb (__mmask16 __A, __mmask16 __B) +{ + return (__mmask16) __builtin_ia32_kunpckhi ((__mmask16) __A, (__mmask16) __B); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_inserti32x4 (__mmask16 __B, __m512i __C, __m128i __D, + const int __imm) +{ + return (__m512i) __builtin_ia32_inserti32x4_mask ((__v16si) __C, + (__v4si) __D, + __imm, + (__v16si) + _mm512_setzero_si512 (), + __B); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_insertf32x4 (__mmask16 __B, __m512 __C, __m128 __D, + const int __imm) +{ + return (__m512) __builtin_ia32_insertf32x4_mask ((__v16sf) __C, + (__v4sf) __D, + __imm, + (__v16sf) + _mm512_setzero_ps (), __B); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_inserti32x4 (__m512i __A, __mmask16 __B, __m512i __C, + __m128i __D, const int __imm) +{ + return (__m512i) __builtin_ia32_inserti32x4_mask ((__v16si) __C, + (__v4si) __D, + __imm, + (__v16si) __A, + __B); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_insertf32x4 (__m512 __A, __mmask16 __B, __m512 __C, + __m128 __D, const int __imm) +{ + return (__m512) __builtin_ia32_insertf32x4_mask ((__v16sf) __C, + (__v4sf) __D, + __imm, + (__v16sf) __A, __B); +} +#else +#define _mm512_maskz_insertf32x4(A, X, Y, C) \ + ((__m512) __builtin_ia32_insertf32x4_mask ((__v16sf)(__m512) (X), \ + (__v4sf)(__m128) (Y), (int) (C), (__v16sf)_mm512_setzero_ps(), \ + (__mmask8)(A))) + +#define _mm512_maskz_inserti32x4(A, X, Y, C) \ + ((__m512i) __builtin_ia32_inserti32x4_mask ((__v16si)(__m512i) (X), \ + (__v4si)(__m128i) (Y), (int) (C), (__v16si)_mm512_setzero_si512 (), \ + (__mmask8)(A))) + +#define _mm512_mask_insertf32x4(A, B, X, Y, C) \ + ((__m512) __builtin_ia32_insertf32x4_mask ((__v16sf)(__m512) (X), \ + (__v4sf)(__m128) (Y), (int) (C), (__v16sf)(__m512) (A), \ + (__mmask8)(B))) + +#define _mm512_mask_inserti32x4(A, B, X, Y, C) \ + ((__m512i) __builtin_ia32_inserti32x4_mask ((__v16si)(__m512i) (X), \ + (__v4si)(__m128i) (Y), (int) (C), (__v16si)(__m512i) (A), \ + (__mmask8)(B))) +#endif + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_max_epi64 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pmaxsq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_max_epi64 (__mmask8 __M, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pmaxsq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_setzero_si512 (), + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_max_epi64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pmaxsq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) __W, __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_min_epi64 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pminsq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_min_epi64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pminsq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) __W, __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_min_epi64 (__mmask8 __M, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pminsq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_setzero_si512 (), + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_max_epu64 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pmaxuq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_max_epu64 (__mmask8 __M, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pmaxuq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_setzero_si512 (), + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_max_epu64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pmaxuq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) __W, __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_min_epu64 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pminuq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_undefined_si512 (), + (__mmask8) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_min_epu64 (__m512i __W, __mmask8 __M, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pminuq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) __W, __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_min_epu64 (__mmask8 __M, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pminuq512_mask ((__v8di) __A, + (__v8di) __B, + (__v8di) + _mm512_setzero_si512 (), + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_max_epi32 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pmaxsd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_max_epi32 (__mmask16 __M, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pmaxsd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_setzero_si512 (), + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_max_epi32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pmaxsd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) __W, __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_min_epi32 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pminsd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_min_epi32 (__mmask16 __M, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pminsd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_setzero_si512 (), + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_min_epi32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pminsd512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) __W, __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_max_epu32 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pmaxud512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_max_epu32 (__mmask16 __M, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pmaxud512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_setzero_si512 (), + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_max_epu32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pmaxud512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) __W, __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_min_epu32 (__m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pminud512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_min_epu32 (__mmask16 __M, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pminud512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) + _mm512_setzero_si512 (), + __M); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_min_epu32 (__m512i __W, __mmask16 __M, __m512i __A, __m512i __B) +{ + return (__m512i) __builtin_ia32_pminud512_mask ((__v16si) __A, + (__v16si) __B, + (__v16si) __W, __M); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_unpacklo_ps (__m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_unpcklps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_unpacklo_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_unpcklps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __W, + (__mmask16) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_unpacklo_ps (__mmask16 __U, __m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_unpcklps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_round_sd (__m128d __A, __m128d __B, const int __R) +{ + return (__m128d) __builtin_ia32_maxsd_round ((__v2df) __A, + (__v2df) __B, + __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_round_ss (__m128 __A, __m128 __B, const int __R) +{ + return (__m128) __builtin_ia32_maxss_round ((__v4sf) __A, + (__v4sf) __B, + __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_round_sd (__m128d __A, __m128d __B, const int __R) +{ + return (__m128d) __builtin_ia32_minsd_round ((__v2df) __A, + (__v2df) __B, + __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_round_ss (__m128 __A, __m128 __B, const int __R) +{ + return (__m128) __builtin_ia32_minss_round ((__v4sf) __A, + (__v4sf) __B, + __R); +} + +#else +#define _mm_max_round_sd(A, B, C) \ + (__m128d)__builtin_ia32_addsd_round(A, B, C) + +#define _mm_max_round_ss(A, B, C) \ + (__m128)__builtin_ia32_addss_round(A, B, C) + +#define _mm_min_round_sd(A, B, C) \ + (__m128d)__builtin_ia32_subsd_round(A, B, C) + +#define _mm_min_round_ss(A, B, C) \ + (__m128)__builtin_ia32_subss_round(A, B, C) +#endif + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_blend_pd (__mmask8 __U, __m512d __A, __m512d __W) +{ + return (__m512d) __builtin_ia32_blendmpd_512_mask ((__v8df) __A, + (__v8df) __W, + (__mmask8) __U); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_blend_ps (__mmask16 __U, __m512 __A, __m512 __W) +{ + return (__m512) __builtin_ia32_blendmps_512_mask ((__v16sf) __A, + (__v16sf) __W, + (__mmask16) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_blend_epi64 (__mmask8 __U, __m512i __A, __m512i __W) +{ + return (__m512i) __builtin_ia32_blendmq_512_mask ((__v8di) __A, + (__v8di) __W, + (__mmask8) __U); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_blend_epi32 (__mmask16 __U, __m512i __A, __m512i __W) +{ + return (__m512i) __builtin_ia32_blendmd_512_mask ((__v16si) __A, + (__v16si) __W, + (__mmask16) __U); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R) +{ + return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W, + (__v2df) __A, + (__v2df) __B, + __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R) +{ + return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W, + (__v4sf) __A, + (__v4sf) __B, + __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R) +{ + return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W, + (__v2df) __A, + -(__v2df) __B, + __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R) +{ + return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W, + (__v4sf) __A, + -(__v4sf) __B, + __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmadd_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R) +{ + return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W, + -(__v2df) __A, + (__v2df) __B, + __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmadd_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R) +{ + return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W, + -(__v4sf) __A, + (__v4sf) __B, + __R); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmsub_round_sd (__m128d __W, __m128d __A, __m128d __B, const int __R) +{ + return (__m128d) __builtin_ia32_vfmaddsd3_round ((__v2df) __W, + -(__v2df) __A, + -(__v2df) __B, + __R); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmsub_round_ss (__m128 __W, __m128 __A, __m128 __B, const int __R) +{ + return (__m128) __builtin_ia32_vfmaddss3_round ((__v4sf) __W, + -(__v4sf) __A, + -(__v4sf) __B, + __R); +} +#else +#define _mm_fmadd_round_sd(A, B, C, R) \ + (__m128d)__builtin_ia32_vfmaddsd3_round(A, B, C, R) + +#define _mm_fmadd_round_ss(A, B, C, R) \ + (__m128)__builtin_ia32_vfmaddss3_round(A, B, C, R) + +#define _mm_fmsub_round_sd(A, B, C, R) \ + (__m128d)__builtin_ia32_vfmaddsd3_round(A, B, -(C), R) + +#define _mm_fmsub_round_ss(A, B, C, R) \ + (__m128)__builtin_ia32_vfmaddss3_round(A, B, -(C), R) + +#define _mm_fnmadd_round_sd(A, B, C, R) \ + (__m128d)__builtin_ia32_vfmaddsd3_round(A, -(B), C, R) + +#define _mm_fnmadd_round_ss(A, B, C, R) \ + (__m128)__builtin_ia32_vfmaddss3_round(A, -(B), C, R) + +#define _mm_fnmsub_round_sd(A, B, C, R) \ + (__m128d)__builtin_ia32_vfmaddsd3_round(A, -(B), -(C), R) + +#define _mm_fnmsub_round_ss(A, B, C, R) \ + (__m128)__builtin_ia32_vfmaddss3_round(A, -(B), -(C), R) +#endif + +#ifdef __OPTIMIZE__ +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comi_round_ss (__m128 __A, __m128 __B, const int __P, const int __R) +{ + return __builtin_ia32_vcomiss ((__v4sf) __A, (__v4sf) __B, __P, __R); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comi_round_sd (__m128d __A, __m128d __B, const int __P, const int __R) +{ + return __builtin_ia32_vcomisd ((__v2df) __A, (__v2df) __B, __P, __R); +} +#else +#define _mm_comi_round_ss(A, B, C, D)\ +__builtin_ia32_vcomiss(A, B, C, D) +#define _mm_comi_round_sd(A, B, C, D)\ +__builtin_ia32_vcomisd(A, B, C, D) +#endif + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sqrt_pd (__m512d __A) +{ + return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sqrt_pd (__m512d __W, __mmask8 __U, __m512d __A) +{ + return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A, + (__v8df) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sqrt_pd (__mmask8 __U, __m512d __A) +{ + return (__m512d) __builtin_ia32_sqrtpd512_mask ((__v8df) __A, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sqrt_ps (__m512 __A) +{ + return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sqrt_ps (__m512 __W, __mmask16 __U, __m512 __A) +{ + return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A, + (__v16sf) __W, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sqrt_ps (__mmask16 __U, __m512 __A) +{ + return (__m512) __builtin_ia32_sqrtps512_mask ((__v16sf) __A, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_add_pd (__m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_add_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_add_pd (__mmask8 __U, __m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_addpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_add_ps (__m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_add_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __W, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_add_ps (__mmask16 __U, __m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_addps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sub_pd (__m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sub_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sub_pd (__mmask8 __U, __m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_subpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_sub_ps (__m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_sub_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __W, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_sub_ps (__mmask16 __U, __m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_subps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mul_pd (__m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_mul_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_mul_pd (__mmask8 __U, __m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_mulpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mul_ps (__m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_mul_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __W, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_mul_ps (__mmask16 __U, __m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_mulps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_div_pd (__m512d __M, __m512d __V) +{ + return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M, + (__v8df) __V, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_div_pd (__m512d __W, __mmask8 __U, __m512d __M, __m512d __V) +{ + return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M, + (__v8df) __V, + (__v8df) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_div_pd (__mmask8 __U, __m512d __M, __m512d __V) +{ + return (__m512d) __builtin_ia32_divpd512_mask ((__v8df) __M, + (__v8df) __V, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_div_ps (__m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_div_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __W, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_div_ps (__mmask16 __U, __m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_divps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_max_pd (__m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_max_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_max_pd (__mmask8 __U, __m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_maxpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_max_ps (__m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_max_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __W, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_max_ps (__mmask16 __U, __m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_maxps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_min_pd (__m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_min_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_min_pd (__mmask8 __U, __m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_minpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_min_ps (__m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_min_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __W, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_min_ps (__mmask16 __U, __m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_minps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_scalef_pd (__m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_scalef_pd (__m512d __W, __mmask8 __U, __m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_scalef_pd (__mmask8 __U, __m512d __A, __m512d __B) +{ + return (__m512d) __builtin_ia32_scalefpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_scalef_ps (__m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_scalef_ps (__m512 __W, __mmask16 __U, __m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __W, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_scalef_ps (__mmask16 __U, __m512 __A, __m512 __B) +{ + return (__m512) __builtin_ia32_scalefps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_scalef_sd (__m128d __A, __m128d __B) +{ + return (__m128d) __builtin_ia32_scalefsd_round ((__v2df) __A, + (__v2df) __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_scalef_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_scalefss_round ((__v4sf) __A, + (__v4sf) __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmadd_pd (__m512d __A, __m512d __B, __m512d __C) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmadd_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmadd_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_mask3 ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmadd_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_maskz ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmadd_ps (__m512 __A, __m512 __B, __m512 __C) +{ + return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmadd_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C) +{ + return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmadd_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U) +{ + return (__m512) __builtin_ia32_vfmaddps512_mask3 ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmadd_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C) +{ + return (__m512) __builtin_ia32_vfmaddps512_maskz ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmsub_pd (__m512d __A, __m512d __B, __m512d __C) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A, + (__v8df) __B, + -(__v8df) __C, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmsub_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_mask ((__v8df) __A, + (__v8df) __B, + -(__v8df) __C, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmsub_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U) +{ + return (__m512d) __builtin_ia32_vfmsubpd512_mask3 ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmsub_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_maskz ((__v8df) __A, + (__v8df) __B, + -(__v8df) __C, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmsub_ps (__m512 __A, __m512 __B, __m512 __C) +{ + return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A, + (__v16sf) __B, + -(__v16sf) __C, + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmsub_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C) +{ + return (__m512) __builtin_ia32_vfmaddps512_mask ((__v16sf) __A, + (__v16sf) __B, + -(__v16sf) __C, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmsub_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U) +{ + return (__m512) __builtin_ia32_vfmsubps512_mask3 ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmsub_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C) +{ + return (__m512) __builtin_ia32_vfmaddps512_maskz ((__v16sf) __A, + (__v16sf) __B, + -(__v16sf) __C, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmaddsub_pd (__m512d __A, __m512d __B, __m512d __C) +{ + return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmaddsub_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C) +{ + return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmaddsub_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U) +{ + return (__m512d) __builtin_ia32_vfmaddsubpd512_mask3 ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmaddsub_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C) +{ + return (__m512d) __builtin_ia32_vfmaddsubpd512_maskz ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmaddsub_ps (__m512 __A, __m512 __B, __m512 __C) +{ + return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmaddsub_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C) +{ + return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmaddsub_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U) +{ + return (__m512) __builtin_ia32_vfmaddsubps512_mask3 ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmaddsub_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C) +{ + return (__m512) __builtin_ia32_vfmaddsubps512_maskz ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmsubadd_pd (__m512d __A, __m512d __B, __m512d __C) +{ + return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A, + (__v8df) __B, + -(__v8df) __C, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmsubadd_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C) +{ + return (__m512d) __builtin_ia32_vfmaddsubpd512_mask ((__v8df) __A, + (__v8df) __B, + -(__v8df) __C, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmsubadd_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U) +{ + return (__m512d) __builtin_ia32_vfmsubaddpd512_mask3 ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmsubadd_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C) +{ + return (__m512d) __builtin_ia32_vfmaddsubpd512_maskz ((__v8df) __A, + (__v8df) __B, + -(__v8df) __C, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fmsubadd_ps (__m512 __A, __m512 __B, __m512 __C) +{ + return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A, + (__v16sf) __B, + -(__v16sf) __C, + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fmsubadd_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C) +{ + return (__m512) __builtin_ia32_vfmaddsubps512_mask ((__v16sf) __A, + (__v16sf) __B, + -(__v16sf) __C, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fmsubadd_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U) +{ + return (__m512) __builtin_ia32_vfmsubaddps512_mask3 ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fmsubadd_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C) +{ + return (__m512) __builtin_ia32_vfmaddsubps512_maskz ((__v16sf) __A, + (__v16sf) __B, + -(__v16sf) __C, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fnmadd_pd (__m512d __A, __m512d __B, __m512d __C) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_mask (-(__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fnmadd_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C) +{ + return (__m512d) __builtin_ia32_vfnmaddpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fnmadd_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_mask3 (-(__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fnmadd_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_maskz (-(__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fnmadd_ps (__m512 __A, __m512 __B, __m512 __C) +{ + return (__m512) __builtin_ia32_vfmaddps512_mask (-(__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fnmadd_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C) +{ + return (__m512) __builtin_ia32_vfnmaddps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fnmadd_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U) +{ + return (__m512) __builtin_ia32_vfmaddps512_mask3 (-(__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fnmadd_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C) +{ + return (__m512) __builtin_ia32_vfmaddps512_maskz (-(__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fnmsub_pd (__m512d __A, __m512d __B, __m512d __C) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_mask (-(__v8df) __A, + (__v8df) __B, + -(__v8df) __C, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fnmsub_pd (__m512d __A, __mmask8 __U, __m512d __B, __m512d __C) +{ + return (__m512d) __builtin_ia32_vfnmsubpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fnmsub_pd (__m512d __A, __m512d __B, __m512d __C, __mmask8 __U) +{ + return (__m512d) __builtin_ia32_vfnmsubpd512_mask3 ((__v8df) __A, + (__v8df) __B, + (__v8df) __C, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fnmsub_pd (__mmask8 __U, __m512d __A, __m512d __B, __m512d __C) +{ + return (__m512d) __builtin_ia32_vfmaddpd512_maskz (-(__v8df) __A, + (__v8df) __B, + -(__v8df) __C, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fnmsub_ps (__m512 __A, __m512 __B, __m512 __C) +{ + return (__m512) __builtin_ia32_vfmaddps512_mask (-(__v16sf) __A, + (__v16sf) __B, + -(__v16sf) __C, + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fnmsub_ps (__m512 __A, __mmask16 __U, __m512 __B, __m512 __C) +{ + return (__m512) __builtin_ia32_vfnmsubps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask3_fnmsub_ps (__m512 __A, __m512 __B, __m512 __C, __mmask16 __U) +{ + return (__m512) __builtin_ia32_vfnmsubps512_mask3 ((__v16sf) __A, + (__v16sf) __B, + (__v16sf) __C, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fnmsub_ps (__mmask16 __U, __m512 __A, __m512 __B, __m512 __C) +{ + return (__m512) __builtin_ia32_vfmaddps512_maskz (-(__v16sf) __A, + (__v16sf) __B, + -(__v16sf) __C, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvttpd_epi32 (__m512d __A) +{ + return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A, + (__v8si) + _mm256_undefined_si256 (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvttpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A) +{ + return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A, + (__v8si) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvttpd_epi32 (__mmask8 __U, __m512d __A) +{ + return (__m256i) __builtin_ia32_cvttpd2dq512_mask ((__v8df) __A, + (__v8si) + _mm256_setzero_si256 (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvttpd_epu32 (__m512d __A) +{ + return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A, + (__v8si) + _mm256_undefined_si256 (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvttpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A) +{ + return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A, + (__v8si) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvttpd_epu32 (__mmask8 __U, __m512d __A) +{ + return (__m256i) __builtin_ia32_cvttpd2udq512_mask ((__v8df) __A, + (__v8si) + _mm256_setzero_si256 (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtpd_epi32 (__m512d __A) +{ + return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A, + (__v8si) + _mm256_undefined_si256 (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtpd_epi32 (__m256i __W, __mmask8 __U, __m512d __A) +{ + return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A, + (__v8si) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtpd_epi32 (__mmask8 __U, __m512d __A) +{ + return (__m256i) __builtin_ia32_cvtpd2dq512_mask ((__v8df) __A, + (__v8si) + _mm256_setzero_si256 (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtpd_epu32 (__m512d __A) +{ + return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A, + (__v8si) + _mm256_undefined_si256 (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtpd_epu32 (__m256i __W, __mmask8 __U, __m512d __A) +{ + return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A, + (__v8si) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtpd_epu32 (__mmask8 __U, __m512d __A) +{ + return (__m256i) __builtin_ia32_cvtpd2udq512_mask ((__v8df) __A, + (__v8si) + _mm256_setzero_si256 (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvttps_epi32 (__m512 __A) +{ + return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvttps_epi32 (__m512i __W, __mmask16 __U, __m512 __A) +{ + return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A, + (__v16si) __W, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvttps_epi32 (__mmask16 __U, __m512 __A) +{ + return (__m512i) __builtin_ia32_cvttps2dq512_mask ((__v16sf) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvttps_epu32 (__m512 __A) +{ + return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvttps_epu32 (__m512i __W, __mmask16 __U, __m512 __A) +{ + return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A, + (__v16si) __W, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvttps_epu32 (__mmask16 __U, __m512 __A) +{ + return (__m512i) __builtin_ia32_cvttps2udq512_mask ((__v16sf) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtps_epi32 (__m512 __A) +{ + return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtps_epi32 (__m512i __W, __mmask16 __U, __m512 __A) +{ + return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A, + (__v16si) __W, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtps_epi32 (__mmask16 __U, __m512 __A) +{ + return (__m512i) __builtin_ia32_cvtps2dq512_mask ((__v16sf) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtps_epu32 (__m512 __A) +{ + return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A, + (__v16si) + _mm512_undefined_si512 (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtps_epu32 (__m512i __W, __mmask16 __U, __m512 __A) +{ + return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A, + (__v16si) __W, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtps_epu32 (__mmask16 __U, __m512 __A) +{ + return (__m512i) __builtin_ia32_cvtps2udq512_mask ((__v16sf) __A, + (__v16si) + _mm512_setzero_si512 (), + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __x86_64__ +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtu64_ss (__m128 __A, unsigned long long __B) +{ + return (__m128) __builtin_ia32_cvtusi2ss64 ((__v4sf) __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtu64_sd (__m128d __A, unsigned long long __B) +{ + return (__m128d) __builtin_ia32_cvtusi2sd64 ((__v2df) __A, __B, + _MM_FROUND_CUR_DIRECTION); +} +#endif + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtu32_ss (__m128 __A, unsigned __B) +{ + return (__m128) __builtin_ia32_cvtusi2ss32 ((__v4sf) __A, __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepi32_ps (__m512i __A) +{ + return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepi32_ps (__m512 __W, __mmask16 __U, __m512i __A) +{ + return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A, + (__v16sf) __W, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepi32_ps (__mmask16 __U, __m512i __A) +{ + return (__m512) __builtin_ia32_cvtdq2ps512_mask ((__v16si) __A, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtepu32_ps (__m512i __A) +{ + return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtepu32_ps (__m512 __W, __mmask16 __U, __m512i __A) +{ + return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A, + (__v16sf) __W, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtepu32_ps (__mmask16 __U, __m512i __A) +{ + return (__m512) __builtin_ia32_cvtudq2ps512_mask ((__v16si) __A, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fixupimm_pd (__m512d __A, __m512d __B, __m512i __C, const int __imm) +{ + return (__m512d) __builtin_ia32_fixupimmpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8di) __C, + __imm, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fixupimm_pd (__m512d __A, __mmask8 __U, __m512d __B, + __m512i __C, const int __imm) +{ + return (__m512d) __builtin_ia32_fixupimmpd512_mask ((__v8df) __A, + (__v8df) __B, + (__v8di) __C, + __imm, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fixupimm_pd (__mmask8 __U, __m512d __A, __m512d __B, + __m512i __C, const int __imm) +{ + return (__m512d) __builtin_ia32_fixupimmpd512_maskz ((__v8df) __A, + (__v8df) __B, + (__v8di) __C, + __imm, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_fixupimm_ps (__m512 __A, __m512 __B, __m512i __C, const int __imm) +{ + return (__m512) __builtin_ia32_fixupimmps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16si) __C, + __imm, + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_fixupimm_ps (__m512 __A, __mmask16 __U, __m512 __B, + __m512i __C, const int __imm) +{ + return (__m512) __builtin_ia32_fixupimmps512_mask ((__v16sf) __A, + (__v16sf) __B, + (__v16si) __C, + __imm, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_fixupimm_ps (__mmask16 __U, __m512 __A, __m512 __B, + __m512i __C, const int __imm) +{ + return (__m512) __builtin_ia32_fixupimmps512_maskz ((__v16sf) __A, + (__v16sf) __B, + (__v16si) __C, + __imm, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fixupimm_sd (__m128d __A, __m128d __B, __m128i __C, const int __imm) +{ + return (__m128d) __builtin_ia32_fixupimmsd_mask ((__v2df) __A, + (__v2df) __B, + (__v2di) __C, __imm, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fixupimm_sd (__m128d __A, __mmask8 __U, __m128d __B, + __m128i __C, const int __imm) +{ + return (__m128d) __builtin_ia32_fixupimmsd_mask ((__v2df) __A, + (__v2df) __B, + (__v2di) __C, __imm, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fixupimm_sd (__mmask8 __U, __m128d __A, __m128d __B, + __m128i __C, const int __imm) +{ + return (__m128d) __builtin_ia32_fixupimmsd_maskz ((__v2df) __A, + (__v2df) __B, + (__v2di) __C, + __imm, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fixupimm_ss (__m128 __A, __m128 __B, __m128i __C, const int __imm) +{ + return (__m128) __builtin_ia32_fixupimmss_mask ((__v4sf) __A, + (__v4sf) __B, + (__v4si) __C, __imm, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_fixupimm_ss (__m128 __A, __mmask8 __U, __m128 __B, + __m128i __C, const int __imm) +{ + return (__m128) __builtin_ia32_fixupimmss_mask ((__v4sf) __A, + (__v4sf) __B, + (__v4si) __C, __imm, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_fixupimm_ss (__mmask8 __U, __m128 __A, __m128 __B, + __m128i __C, const int __imm) +{ + return (__m128) __builtin_ia32_fixupimmss_maskz ((__v4sf) __A, + (__v4sf) __B, + (__v4si) __C, __imm, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} +#else +#define _mm512_fixupimm_pd(X, Y, Z, C) \ + ((__m512d)__builtin_ia32_fixupimmpd512_mask ((__v8df)(__m512d)(X), \ + (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C), \ + (__mmask8)(-1), _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_mask_fixupimm_pd(X, U, Y, Z, C) \ + ((__m512d)__builtin_ia32_fixupimmpd512_mask ((__v8df)(__m512d)(X), \ + (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C), \ + (__mmask8)(U), _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_maskz_fixupimm_pd(U, X, Y, Z, C) \ + ((__m512d)__builtin_ia32_fixupimmpd512_maskz ((__v8df)(__m512d)(X), \ + (__v8df)(__m512d)(Y), (__v8di)(__m512i)(Z), (int)(C), \ + (__mmask8)(U), _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_fixupimm_ps(X, Y, Z, C) \ + ((__m512)__builtin_ia32_fixupimmps512_mask ((__v16sf)(__m512)(X), \ + (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C), \ + (__mmask16)(-1), _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_mask_fixupimm_ps(X, U, Y, Z, C) \ + ((__m512)__builtin_ia32_fixupimmps512_mask ((__v16sf)(__m512)(X), \ + (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C), \ + (__mmask16)(U), _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_maskz_fixupimm_ps(U, X, Y, Z, C) \ + ((__m512)__builtin_ia32_fixupimmps512_maskz ((__v16sf)(__m512)(X), \ + (__v16sf)(__m512)(Y), (__v16si)(__m512i)(Z), (int)(C), \ + (__mmask16)(U), _MM_FROUND_CUR_DIRECTION)) + +#define _mm_fixupimm_sd(X, Y, Z, C) \ + ((__m128d)__builtin_ia32_fixupimmsd_mask ((__v2df)(__m128d)(X), \ + (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C), \ + (__mmask8)(-1), _MM_FROUND_CUR_DIRECTION)) + +#define _mm_mask_fixupimm_sd(X, U, Y, Z, C) \ + ((__m128d)__builtin_ia32_fixupimmsd_mask ((__v2df)(__m128d)(X), \ + (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C), \ + (__mmask8)(U), _MM_FROUND_CUR_DIRECTION)) + +#define _mm_maskz_fixupimm_sd(U, X, Y, Z, C) \ + ((__m128d)__builtin_ia32_fixupimmsd_maskz ((__v2df)(__m128d)(X), \ + (__v2df)(__m128d)(Y), (__v2di)(__m128i)(Z), (int)(C), \ + (__mmask8)(U), _MM_FROUND_CUR_DIRECTION)) + +#define _mm_fixupimm_ss(X, Y, Z, C) \ + ((__m128)__builtin_ia32_fixupimmss_mask ((__v4sf)(__m128)(X), \ + (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C), \ + (__mmask8)(-1), _MM_FROUND_CUR_DIRECTION)) + +#define _mm_mask_fixupimm_ss(X, U, Y, Z, C) \ + ((__m128)__builtin_ia32_fixupimmss_mask ((__v4sf)(__m128)(X), \ + (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C), \ + (__mmask8)(U), _MM_FROUND_CUR_DIRECTION)) + +#define _mm_maskz_fixupimm_ss(U, X, Y, Z, C) \ + ((__m128)__builtin_ia32_fixupimmss_maskz ((__v4sf)(__m128)(X), \ + (__v4sf)(__m128)(Y), (__v4si)(__m128i)(Z), (int)(C), \ + (__mmask8)(U), _MM_FROUND_CUR_DIRECTION)) +#endif + +#ifdef __x86_64__ +extern __inline unsigned long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtss_u64 (__m128 __A) +{ + return (unsigned long long) __builtin_ia32_vcvtss2usi64 ((__v4sf) + __A, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline unsigned long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttss_u64 (__m128 __A) +{ + return (unsigned long long) __builtin_ia32_vcvttss2usi64 ((__v4sf) + __A, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttss_i64 (__m128 __A) +{ + return (long long) __builtin_ia32_vcvttss2si64 ((__v4sf) __A, + _MM_FROUND_CUR_DIRECTION); +} +#endif /* __x86_64__ */ + +extern __inline unsigned +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtss_u32 (__m128 __A) +{ + return (unsigned) __builtin_ia32_vcvtss2usi32 ((__v4sf) __A, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline unsigned +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttss_u32 (__m128 __A) +{ + return (unsigned) __builtin_ia32_vcvttss2usi32 ((__v4sf) __A, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttss_i32 (__m128 __A) +{ + return (int) __builtin_ia32_vcvttss2si32 ((__v4sf) __A, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __x86_64__ +extern __inline unsigned long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsd_u64 (__m128d __A) +{ + return (unsigned long long) __builtin_ia32_vcvtsd2usi64 ((__v2df) + __A, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline unsigned long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttsd_u64 (__m128d __A) +{ + return (unsigned long long) __builtin_ia32_vcvttsd2usi64 ((__v2df) + __A, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttsd_i64 (__m128d __A) +{ + return (long long) __builtin_ia32_vcvttsd2si64 ((__v2df) __A, + _MM_FROUND_CUR_DIRECTION); +} +#endif /* __x86_64__ */ + +extern __inline unsigned +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsd_u32 (__m128d __A) +{ + return (unsigned) __builtin_ia32_vcvtsd2usi32 ((__v2df) __A, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline unsigned +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttsd_u32 (__m128d __A) +{ + return (unsigned) __builtin_ia32_vcvttsd2usi32 ((__v2df) __A, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttsd_i32 (__m128d __A) +{ + return (int) __builtin_ia32_vcvttsd2si32 ((__v2df) __A, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtps_pd (__m256 __A) +{ + return (__m512d) __builtin_ia32_cvtps2pd512_mask ((__v8sf) __A, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtps_pd (__m512d __W, __mmask8 __U, __m256 __A) +{ + return (__m512d) __builtin_ia32_cvtps2pd512_mask ((__v8sf) __A, + (__v8df) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtps_pd (__mmask8 __U, __m256 __A) +{ + return (__m512d) __builtin_ia32_cvtps2pd512_mask ((__v8sf) __A, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtph_ps (__m256i __A) +{ + return (__m512) __builtin_ia32_vcvtph2ps512_mask ((__v16hi) __A, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtph_ps (__m512 __W, __mmask16 __U, __m256i __A) +{ + return (__m512) __builtin_ia32_vcvtph2ps512_mask ((__v16hi) __A, + (__v16sf) __W, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtph_ps (__mmask16 __U, __m256i __A) +{ + return (__m512) __builtin_ia32_vcvtph2ps512_mask ((__v16hi) __A, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cvtpd_ps (__m512d __A) +{ + return (__m256) __builtin_ia32_cvtpd2ps512_mask ((__v8df) __A, + (__v8sf) + _mm256_undefined_ps (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cvtpd_ps (__m256 __W, __mmask8 __U, __m512d __A) +{ + return (__m256) __builtin_ia32_cvtpd2ps512_mask ((__v8df) __A, + (__v8sf) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_cvtpd_ps (__mmask8 __U, __m512d __A) +{ + return (__m256) __builtin_ia32_cvtpd2ps512_mask ((__v8df) __A, + (__v8sf) + _mm256_setzero_ps (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +#ifdef __OPTIMIZE__ +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_getexp_ps (__m512 __A) +{ + return (__m512) __builtin_ia32_getexpps512_mask ((__v16sf) __A, + (__v16sf) + _mm512_undefined_ps (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_getexp_ps (__m512 __W, __mmask16 __U, __m512 __A) +{ + return (__m512) __builtin_ia32_getexpps512_mask ((__v16sf) __A, + (__v16sf) __W, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_getexp_ps (__mmask16 __U, __m512 __A) +{ + return (__m512) __builtin_ia32_getexpps512_mask ((__v16sf) __A, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_getexp_pd (__m512d __A) +{ + return (__m512d) __builtin_ia32_getexppd512_mask ((__v8df) __A, + (__v8df) + _mm512_undefined_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_getexp_pd (__m512d __W, __mmask8 __U, __m512d __A) +{ + return (__m512d) __builtin_ia32_getexppd512_mask ((__v8df) __A, + (__v8df) __W, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_getexp_pd (__mmask8 __U, __m512d __A) +{ + return (__m512d) __builtin_ia32_getexppd512_mask ((__v8df) __A, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_getexp_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_getexpss128_round ((__v4sf) __A, + (__v4sf) __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_getexp_sd (__m128d __A, __m128d __B) +{ + return (__m128d) __builtin_ia32_getexpsd128_round ((__v2df) __A, + (__v2df) __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_getmant_pd (__m512d __A, _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m512d) __builtin_ia32_getmantpd512_mask ((__v8df) __A, + (__C << 2) | __B, + _mm512_undefined_pd (), + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_getmant_pd (__m512d __W, __mmask8 __U, __m512d __A, + _MM_MANTISSA_NORM_ENUM __B, _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m512d) __builtin_ia32_getmantpd512_mask ((__v8df) __A, + (__C << 2) | __B, + (__v8df) __W, __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_getmant_pd (__mmask8 __U, __m512d __A, + _MM_MANTISSA_NORM_ENUM __B, _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m512d) __builtin_ia32_getmantpd512_mask ((__v8df) __A, + (__C << 2) | __B, + (__v8df) + _mm512_setzero_pd (), + __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_getmant_ps (__m512 __A, _MM_MANTISSA_NORM_ENUM __B, + _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m512) __builtin_ia32_getmantps512_mask ((__v16sf) __A, + (__C << 2) | __B, + _mm512_undefined_ps (), + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_getmant_ps (__m512 __W, __mmask16 __U, __m512 __A, + _MM_MANTISSA_NORM_ENUM __B, _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m512) __builtin_ia32_getmantps512_mask ((__v16sf) __A, + (__C << 2) | __B, + (__v16sf) __W, __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_getmant_ps (__mmask16 __U, __m512 __A, + _MM_MANTISSA_NORM_ENUM __B, _MM_MANTISSA_SIGN_ENUM __C) +{ + return (__m512) __builtin_ia32_getmantps512_mask ((__v16sf) __A, + (__C << 2) | __B, + (__v16sf) + _mm512_setzero_ps (), + __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_getmant_sd (__m128d __A, __m128d __B, _MM_MANTISSA_NORM_ENUM __C, + _MM_MANTISSA_SIGN_ENUM __D) +{ + return (__m128d) __builtin_ia32_getmantsd_round ((__v2df) __A, + (__v2df) __B, + (__D << 2) | __C, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_getmant_ss (__m128 __A, __m128 __B, _MM_MANTISSA_NORM_ENUM __C, + _MM_MANTISSA_SIGN_ENUM __D) +{ + return (__m128) __builtin_ia32_getmantss_round ((__v4sf) __A, + (__v4sf) __B, + (__D << 2) | __C, + _MM_FROUND_CUR_DIRECTION); +} + +#else +#define _mm512_getmant_pd(X, B, C) \ + ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X), \ + (int)(((C)<<2) | (B)), \ + (__v8df)_mm512_undefined_pd(), \ + (__mmask8)-1,\ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_mask_getmant_pd(W, U, X, B, C) \ + ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X), \ + (int)(((C)<<2) | (B)), \ + (__v8df)(__m512d)(W), \ + (__mmask8)(U),\ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_maskz_getmant_pd(U, X, B, C) \ + ((__m512d)__builtin_ia32_getmantpd512_mask ((__v8df)(__m512d)(X), \ + (int)(((C)<<2) | (B)), \ + (__v8df)_mm512_setzero_pd(), \ + (__mmask8)(U),\ + _MM_FROUND_CUR_DIRECTION)) +#define _mm512_getmant_ps(X, B, C) \ + ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X), \ + (int)(((C)<<2) | (B)), \ + (__v16sf)_mm512_undefined_ps(), \ + (__mmask16)-1,\ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_mask_getmant_ps(W, U, X, B, C) \ + ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X), \ + (int)(((C)<<2) | (B)), \ + (__v16sf)(__m512)(W), \ + (__mmask16)(U),\ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_maskz_getmant_ps(U, X, B, C) \ + ((__m512)__builtin_ia32_getmantps512_mask ((__v16sf)(__m512)(X), \ + (int)(((C)<<2) | (B)), \ + (__v16sf)_mm512_setzero_ps(), \ + (__mmask16)(U),\ + _MM_FROUND_CUR_DIRECTION)) +#define _mm_getmant_sd(X, Y, C, D) \ + ((__m128d)__builtin_ia32_getmantsd_round ((__v2df)(__m128d)(X), \ + (__v2df)(__m128d)(Y), \ + (int)(((D)<<2) | (C)), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm_getmant_ss(X, Y, C, D) \ + ((__m128)__builtin_ia32_getmantss_round ((__v4sf)(__m128)(X), \ + (__v4sf)(__m128)(Y), \ + (int)(((D)<<2) | (C)), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm_getexp_ss(A, B) \ + ((__m128)__builtin_ia32_getexpss128_mask((__v4sf)(__m128)(A), (__v4sf)(__m128)(B), \ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm_getexp_sd(A, B) \ + ((__m128d)__builtin_ia32_getexpsd128_mask((__v2df)(__m128d)(A), (__v2df)(__m128d)(B),\ + _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_getexp_ps(A) \ + ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A), \ + (__v16sf)_mm512_undefined_ps(), (__mmask16)-1, _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_mask_getexp_ps(W, U, A) \ + ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A), \ + (__v16sf)(__m512)(W), (__mmask16)(U), _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_maskz_getexp_ps(U, A) \ + ((__m512)__builtin_ia32_getexpps512_mask((__v16sf)(__m512)(A), \ + (__v16sf)_mm512_setzero_ps(), (__mmask16)(U), _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_getexp_pd(A) \ + ((__m512d)__builtin_ia32_getexppd512_mask((__v8df)(__m512d)(A), \ + (__v8df)_mm512_undefined_pd(), (__mmask8)-1, _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_mask_getexp_pd(W, U, A) \ + ((__m512d)__builtin_ia32_getexppd512_mask((__v8df)(__m512d)(A), \ + (__v8df)(__m512d)(W), (__mmask8)(U), _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_maskz_getexp_pd(U, A) \ + ((__m512d)__builtin_ia32_getexppd512_mask((__v8df)(__m512d)(A), \ + (__v8df)_mm512_setzero_pd(), (__mmask8)(U), _MM_FROUND_CUR_DIRECTION)) +#endif + +#ifdef __OPTIMIZE__ +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_roundscale_ps (__m512 __A, const int __imm) +{ + return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __A, __imm, + (__v16sf) + _mm512_undefined_ps (), + -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_roundscale_ps (__m512 __A, __mmask16 __B, __m512 __C, + const int __imm) +{ + return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __C, __imm, + (__v16sf) __A, + (__mmask16) __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_roundscale_ps (__mmask16 __A, __m512 __B, const int __imm) +{ + return (__m512) __builtin_ia32_rndscaleps_mask ((__v16sf) __B, + __imm, + (__v16sf) + _mm512_setzero_ps (), + (__mmask16) __A, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_roundscale_pd (__m512d __A, const int __imm) +{ + return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __A, __imm, + (__v8df) + _mm512_undefined_pd (), + -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_roundscale_pd (__m512d __A, __mmask8 __B, __m512d __C, + const int __imm) +{ + return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __C, __imm, + (__v8df) __A, + (__mmask8) __B, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_maskz_roundscale_pd (__mmask8 __A, __m512d __B, const int __imm) +{ + return (__m512d) __builtin_ia32_rndscalepd_mask ((__v8df) __B, + __imm, + (__v8df) + _mm512_setzero_pd (), + (__mmask8) __A, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_roundscale_ss (__m128 __A, __m128 __B, const int __imm) +{ + return (__m128) __builtin_ia32_rndscaless_round ((__v4sf) __A, + (__v4sf) __B, __imm, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_roundscale_sd (__m128d __A, __m128d __B, const int __imm) +{ + return (__m128d) __builtin_ia32_rndscalesd_round ((__v2df) __A, + (__v2df) __B, __imm, + _MM_FROUND_CUR_DIRECTION); +} + +#else +#define _mm512_roundscale_ps(A, B) \ + ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(A), (int)(B),\ + (__v16sf)_mm512_undefined_ps(), (__mmask16)(-1), _MM_FROUND_CUR_DIRECTION)) +#define _mm512_mask_roundscale_ps(A, B, C, D) \ + ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(C), \ + (int)(D), \ + (__v16sf)(__m512)(A), \ + (__mmask16)(B), _MM_FROUND_CUR_DIRECTION)) +#define _mm512_maskz_roundscale_ps(A, B, C) \ + ((__m512) __builtin_ia32_rndscaleps_mask ((__v16sf)(__m512)(B), \ + (int)(C), \ + (__v16sf)_mm512_setzero_ps(),\ + (__mmask16)(A), _MM_FROUND_CUR_DIRECTION)) +#define _mm512_roundscale_pd(A, B) \ + ((__m512d) __builtin_ia32_rndscalepd_mask ((__v8df)(__m512d)(A), (int)(B),\ + (__v8df)_mm512_undefined_pd(), (__mmask8)(-1), _MM_FROUND_CUR_DIRECTION)) +#define _mm512_mask_roundscale_pd(A, B, C, D) \ + ((__m512d) __builtin_ia32_rndscalepd_mask ((__v8df)(__m512d)(C), \ + (int)(D), \ + (__v8df)(__m512d)(A), \ + (__mmask8)(B), _MM_FROUND_CUR_DIRECTION)) +#define _mm512_maskz_roundscale_pd(A, B, C) \ + ((__m512d) __builtin_ia32_rndscalepd_mask ((__v8df)(__m512d)(B), \ + (int)(C), \ + (__v8df)_mm512_setzero_pd(),\ + (__mmask8)(A), _MM_FROUND_CUR_DIRECTION)) +#define _mm_roundscale_ss(A, B, C) \ + ((__m128) __builtin_ia32_rndscaless_round ((__v4sf)(__m128)(A), \ + (__v4sf)(__m128)(B), (int)(C), _MM_FROUND_CUR_DIRECTION)) +#define _mm_roundscale_sd(A, B, C) \ + ((__m128d) __builtin_ia32_rndscalesd_round ((__v2df)(__m128d)(A), \ + (__v2df)(__m128d)(B), (int)(C), _MM_FROUND_CUR_DIRECTION)) +#endif + +#ifdef __OPTIMIZE__ +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmp_pd_mask (__m512d __X, __m512d __Y, const int __P) +{ + return (__mmask8) __builtin_ia32_cmppd512_mask ((__v8df) __X, + (__v8df) __Y, __P, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_cmp_ps_mask (__m512 __X, __m512 __Y, const int __P) +{ + return (__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf) __X, + (__v16sf) __Y, __P, + (__mmask16) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cmp_ps_mask (__mmask16 __U, __m512 __X, __m512 __Y, const int __P) +{ + return (__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf) __X, + (__v16sf) __Y, __P, + (__mmask16) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_cmp_pd_mask (__mmask8 __U, __m512d __X, __m512d __Y, const int __P) +{ + return (__mmask8) __builtin_ia32_cmppd512_mask ((__v8df) __X, + (__v8df) __Y, __P, + (__mmask8) __U, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmp_sd_mask (__m128d __X, __m128d __Y, const int __P) +{ + return (__mmask8) __builtin_ia32_cmpsd_mask ((__v2df) __X, + (__v2df) __Y, __P, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cmp_sd_mask (__mmask8 __M, __m128d __X, __m128d __Y, const int __P) +{ + return (__mmask8) __builtin_ia32_cmpsd_mask ((__v2df) __X, + (__v2df) __Y, __P, + (__mmask8) __M, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmp_ss_mask (__m128 __X, __m128 __Y, const int __P) +{ + return (__mmask8) __builtin_ia32_cmpss_mask ((__v4sf) __X, + (__v4sf) __Y, __P, + (__mmask8) -1, + _MM_FROUND_CUR_DIRECTION); +} + +extern __inline __mmask8 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_cmp_ss_mask (__mmask8 __M, __m128 __X, __m128 __Y, const int __P) +{ + return (__mmask8) __builtin_ia32_cmpss_mask ((__v4sf) __X, + (__v4sf) __Y, __P, + (__mmask8) __M, + _MM_FROUND_CUR_DIRECTION); +} + +#else +#define _mm512_cmp_pd_mask(X, Y, P) \ + ((__mmask8) __builtin_ia32_cmppd512_mask ((__v8df)(__m512d)(X), \ + (__v8df)(__m512d)(Y), (int)(P),\ + (__mmask8)-1,_MM_FROUND_CUR_DIRECTION)) + +#define _mm512_cmp_ps_mask(X, Y, P) \ + ((__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf)(__m512)(X), \ + (__v16sf)(__m512)(Y), (int)(P),\ + (__mmask16)-1,_MM_FROUND_CUR_DIRECTION)) + +#define _mm512_mask_cmp_pd_mask(M, X, Y, P) \ + ((__mmask8) __builtin_ia32_cmppd512_mask ((__v8df)(__m512d)(X), \ + (__v8df)(__m512d)(Y), (int)(P),\ + (__mmask8)M, _MM_FROUND_CUR_DIRECTION)) + +#define _mm512_mask_cmp_ps_mask(M, X, Y, P) \ + ((__mmask16) __builtin_ia32_cmpps512_mask ((__v16sf)(__m512)(X), \ + (__v16sf)(__m512)(Y), (int)(P),\ + (__mmask16)M,_MM_FROUND_CUR_DIRECTION)) + +#define _mm_cmp_sd_mask(X, Y, P) \ + ((__mmask8) __builtin_ia32_cmpsd_mask ((__v2df)(__m128d)(X), \ + (__v2df)(__m128d)(Y), (int)(P),\ + (__mmask8)-1,_MM_FROUND_CUR_DIRECTION)) + +#define _mm_mask_cmp_sd_mask(M, X, Y, P) \ + ((__mmask8) __builtin_ia32_cmpsd_mask ((__v2df)(__m128d)(X), \ + (__v2df)(__m128d)(Y), (int)(P),\ + M,_MM_FROUND_CUR_DIRECTION)) + +#define _mm_cmp_ss_mask(X, Y, P) \ + ((__mmask8) __builtin_ia32_cmpss_mask ((__v4sf)(__m128)(X), \ + (__v4sf)(__m128)(Y), (int)(P), \ + (__mmask8)-1,_MM_FROUND_CUR_DIRECTION)) + +#define _mm_mask_cmp_ss_mask(M, X, Y, P) \ + ((__mmask8) __builtin_ia32_cmpss_mask ((__v4sf)(__m128)(X), \ + (__v4sf)(__m128)(Y), (int)(P), \ + M,_MM_FROUND_CUR_DIRECTION)) +#endif + +extern __inline __mmask16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_kmov (__mmask16 __A) +{ + return __builtin_ia32_kmov16 (__A); +} + +#ifdef __DISABLE_AVX512F__ +#undef __DISABLE_AVX512F__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX512F__ */ + +#endif /* _AVX512FINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/avx512pfintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/avx512pfintrin.h new file mode 100644 index 0000000..bc7598e --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/avx512pfintrin.h @@ -0,0 +1,212 @@ +/* Copyright (C) 2013-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVX512PFINTRIN_H_INCLUDED +#define _AVX512PFINTRIN_H_INCLUDED + +#ifndef __AVX512PF__ +#pragma GCC push_options +#pragma GCC target("avx512pf") +#define __DISABLE_AVX512PF__ +#endif /* __AVX512PF__ */ + +/* Internal data types for implementing the intrinsics. */ +typedef long long __v8di __attribute__ ((__vector_size__ (64))); +typedef int __v16si __attribute__ ((__vector_size__ (64))); + +/* The Intel API is flexible enough that we must allow aliasing with other + vector types, and their scalar components. */ +typedef long long __m512i __attribute__ ((__vector_size__ (64), __may_alias__)); + +typedef unsigned char __mmask8; +typedef unsigned short __mmask16; + +#ifdef __OPTIMIZE__ +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_prefetch_i32gather_pd (__m256i index, __mmask8 mask, + void *addr, int scale, int hint) +{ + __builtin_ia32_gatherpfdpd (mask, (__v8si) index, (long long const *) addr, + scale, hint); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_prefetch_i32gather_ps (__m512i index, __mmask16 mask, + void *addr, int scale, int hint) +{ + __builtin_ia32_gatherpfdps (mask, (__v16si) index, (int const *) addr, + scale, hint); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_prefetch_i64gather_pd (__m512i index, __mmask8 mask, + void *addr, int scale, int hint) +{ + __builtin_ia32_gatherpfqpd (mask, (__v8di) index, (long long const *) addr, + scale, hint); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_prefetch_i64gather_ps (__m512i index, __mmask8 mask, + void *addr, int scale, int hint) +{ + __builtin_ia32_gatherpfqps (mask, (__v8di) index, (int const *) addr, + scale, hint); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_prefetch_i32scatter_pd (void *addr, __m256i index, int scale, + int hint) +{ + __builtin_ia32_scatterpfdpd ((__mmask8) 0xFF, (__v8si) index, + (long long const *)addr, scale, hint); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_prefetch_i32scatter_ps (void *addr, __m512i index, int scale, + int hint) +{ + __builtin_ia32_scatterpfdps ((__mmask16) 0xFFFF, (__v16si) index, (int const *) addr, + scale, hint); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_prefetch_i32scatter_pd (void *addr, __mmask8 mask, + __m256i index, int scale, int hint) +{ + __builtin_ia32_scatterpfdpd (mask, (__v8si) index, (long long const *) addr, + scale, hint); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_prefetch_i32scatter_ps (void *addr, __mmask16 mask, + __m512i index, int scale, int hint) +{ + __builtin_ia32_scatterpfdps (mask, (__v16si) index, (int const *) addr, + scale, hint); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_prefetch_i64scatter_pd (void *addr, __m512i index, int scale, + int hint) +{ + __builtin_ia32_scatterpfqpd ((__mmask8) 0xFF, (__v8di) index, (long long const *) addr, + scale, hint); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_prefetch_i64scatter_ps (void *addr, __m512i index, int scale, + int hint) +{ + __builtin_ia32_scatterpfqps ((__mmask8) 0xFF, (__v8di) index, (int const *) addr, + scale, hint); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_prefetch_i64scatter_pd (void *addr, __mmask16 mask, + __m512i index, int scale, int hint) +{ + __builtin_ia32_scatterpfqpd (mask, (__v8di) index, (long long const *) addr, + scale, hint); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_prefetch_i64scatter_ps (void *addr, __mmask16 mask, + __m512i index, int scale, int hint) +{ + __builtin_ia32_scatterpfqps (mask, (__v8di) index, (int const *) addr, + scale, hint); +} + +#else +#define _mm512_mask_prefetch_i32gather_pd(INDEX, MASK, ADDR, SCALE, HINT) \ + __builtin_ia32_gatherpfdpd ((__mmask8)MASK, (__v8si)(__m256i)INDEX, \ + (long long const *)ADDR, (int)SCALE, (int)HINT) + +#define _mm512_mask_prefetch_i32gather_ps(INDEX, MASK, ADDR, SCALE, HINT) \ + __builtin_ia32_gatherpfdps ((__mmask16)MASK, (__v16si)(__m512i)INDEX, \ + (int const *)ADDR, (int)SCALE, (int)HINT) + +#define _mm512_mask_prefetch_i64gather_pd(INDEX, MASK, ADDR, SCALE, HINT) \ + __builtin_ia32_gatherpfqpd ((__mmask8)MASK, (__v8di)(__m512i)INDEX, \ + (long long const *)ADDR, (int)SCALE, (int)HINT) + +#define _mm512_mask_prefetch_i64gather_ps(INDEX, MASK, ADDR, SCALE, HINT) \ + __builtin_ia32_gatherpfqps ((__mmask8)MASK, (__v8di)(__m512i)INDEX, \ + (int const *)ADDR, (int)SCALE, (int)HINT) + +#define _mm512_prefetch_i32scatter_pd(ADDR, INDEX, SCALE, HINT) \ + __builtin_ia32_scatterpfdpd ((__mmask8)0xFF, (__v8si)(__m256i)INDEX, \ + (long long const *)ADDR, (int)SCALE, (int)HINT) + +#define _mm512_prefetch_i32scatter_ps(ADDR, INDEX, SCALE, HINT) \ + __builtin_ia32_scatterpfdps ((__mmask16)0xFFFF, (__v16si)(__m512i)INDEX, \ + (int const *)ADDR, (int)SCALE, (int)HINT) + +#define _mm512_mask_prefetch_i32scatter_pd(ADDR, MASK, INDEX, SCALE, HINT) \ + __builtin_ia32_scatterpfdpd ((__mmask8)MASK, (__v8si)(__m256i)INDEX, \ + (long long const *)ADDR, (int)SCALE, (int)HINT) + +#define _mm512_mask_prefetch_i32scatter_ps(ADDR, MASK, INDEX, SCALE, HINT) \ + __builtin_ia32_scatterpfdps ((__mmask16)MASK, (__v16si)(__m512i)INDEX, \ + (int const *)ADDR, (int)SCALE, (int)HINT) + +#define _mm512_prefetch_i64scatter_pd(ADDR, INDEX, SCALE, HINT) \ + __builtin_ia32_scatterpfqpd ((__mmask8)0xFF, (__v8di)(__m512i)INDEX, \ + (long long const *)ADDR, (int)SCALE, (int)HINT) + +#define _mm512_prefetch_i64scatter_ps(ADDR, INDEX, SCALE, HINT) \ + __builtin_ia32_scatterpfqps ((__mmask8)0xFF, (__v8di)(__m512i)INDEX, \ + (int const *)ADDR, (int)SCALE, (int)HINT) + +#define _mm512_mask_prefetch_i64scatter_pd(ADDR, MASK, INDEX, SCALE, HINT) \ + __builtin_ia32_scatterpfqpd ((__mmask8)MASK, (__v8di)(__m512i)INDEX, \ + (long long const *)ADDR, (int)SCALE, (int)HINT) + +#define _mm512_mask_prefetch_i64scatter_ps(ADDR, MASK, INDEX, SCALE, HINT) \ + __builtin_ia32_scatterpfqps ((__mmask8)MASK, (__v8di)(__m512i)INDEX, \ + (int const *)ADDR, (int)SCALE, (int)HINT) +#endif + +#ifdef __DISABLE_AVX512PF__ +#undef __DISABLE_AVX512PF__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX512PF__ */ + +#endif /* _AVX512PFINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/avxintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/avxintrin.h new file mode 100644 index 0000000..2ea327c --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/avxintrin.h @@ -0,0 +1,1463 @@ +/* Copyright (C) 2008-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* Implemented from the specification included in the Intel C++ Compiler + User Guide and Reference, version 11.0. */ + +#ifndef _IMMINTRIN_H_INCLUDED +# error "Never use directly; include instead." +#endif + +#ifndef _AVXINTRIN_H_INCLUDED +#define _AVXINTRIN_H_INCLUDED + +#ifndef __AVX__ +#pragma GCC push_options +#pragma GCC target("avx") +#define __DISABLE_AVX__ +#endif /* __AVX__ */ + +/* Internal data types for implementing the intrinsics. */ +typedef double __v4df __attribute__ ((__vector_size__ (32))); +typedef float __v8sf __attribute__ ((__vector_size__ (32))); +typedef long long __v4di __attribute__ ((__vector_size__ (32))); +typedef int __v8si __attribute__ ((__vector_size__ (32))); +typedef short __v16hi __attribute__ ((__vector_size__ (32))); +typedef char __v32qi __attribute__ ((__vector_size__ (32))); + +/* The Intel API is flexible enough that we must allow aliasing with other + vector types, and their scalar components. */ +typedef float __m256 __attribute__ ((__vector_size__ (32), + __may_alias__)); +typedef long long __m256i __attribute__ ((__vector_size__ (32), + __may_alias__)); +typedef double __m256d __attribute__ ((__vector_size__ (32), + __may_alias__)); + +/* Compare predicates for scalar and packed compare intrinsics. */ + +/* Equal (ordered, non-signaling) */ +#define _CMP_EQ_OQ 0x00 +/* Less-than (ordered, signaling) */ +#define _CMP_LT_OS 0x01 +/* Less-than-or-equal (ordered, signaling) */ +#define _CMP_LE_OS 0x02 +/* Unordered (non-signaling) */ +#define _CMP_UNORD_Q 0x03 +/* Not-equal (unordered, non-signaling) */ +#define _CMP_NEQ_UQ 0x04 +/* Not-less-than (unordered, signaling) */ +#define _CMP_NLT_US 0x05 +/* Not-less-than-or-equal (unordered, signaling) */ +#define _CMP_NLE_US 0x06 +/* Ordered (nonsignaling) */ +#define _CMP_ORD_Q 0x07 +/* Equal (unordered, non-signaling) */ +#define _CMP_EQ_UQ 0x08 +/* Not-greater-than-or-equal (unordered, signaling) */ +#define _CMP_NGE_US 0x09 +/* Not-greater-than (unordered, signaling) */ +#define _CMP_NGT_US 0x0a +/* False (ordered, non-signaling) */ +#define _CMP_FALSE_OQ 0x0b +/* Not-equal (ordered, non-signaling) */ +#define _CMP_NEQ_OQ 0x0c +/* Greater-than-or-equal (ordered, signaling) */ +#define _CMP_GE_OS 0x0d +/* Greater-than (ordered, signaling) */ +#define _CMP_GT_OS 0x0e +/* True (unordered, non-signaling) */ +#define _CMP_TRUE_UQ 0x0f +/* Equal (ordered, signaling) */ +#define _CMP_EQ_OS 0x10 +/* Less-than (ordered, non-signaling) */ +#define _CMP_LT_OQ 0x11 +/* Less-than-or-equal (ordered, non-signaling) */ +#define _CMP_LE_OQ 0x12 +/* Unordered (signaling) */ +#define _CMP_UNORD_S 0x13 +/* Not-equal (unordered, signaling) */ +#define _CMP_NEQ_US 0x14 +/* Not-less-than (unordered, non-signaling) */ +#define _CMP_NLT_UQ 0x15 +/* Not-less-than-or-equal (unordered, non-signaling) */ +#define _CMP_NLE_UQ 0x16 +/* Ordered (signaling) */ +#define _CMP_ORD_S 0x17 +/* Equal (unordered, signaling) */ +#define _CMP_EQ_US 0x18 +/* Not-greater-than-or-equal (unordered, non-signaling) */ +#define _CMP_NGE_UQ 0x19 +/* Not-greater-than (unordered, non-signaling) */ +#define _CMP_NGT_UQ 0x1a +/* False (ordered, signaling) */ +#define _CMP_FALSE_OS 0x1b +/* Not-equal (ordered, signaling) */ +#define _CMP_NEQ_OS 0x1c +/* Greater-than-or-equal (ordered, non-signaling) */ +#define _CMP_GE_OQ 0x1d +/* Greater-than (ordered, non-signaling) */ +#define _CMP_GT_OQ 0x1e +/* True (unordered, signaling) */ +#define _CMP_TRUE_US 0x1f + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_add_pd (__m256d __A, __m256d __B) +{ + return (__m256d) __builtin_ia32_addpd256 ((__v4df)__A, (__v4df)__B); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_add_ps (__m256 __A, __m256 __B) +{ + return (__m256) __builtin_ia32_addps256 ((__v8sf)__A, (__v8sf)__B); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_addsub_pd (__m256d __A, __m256d __B) +{ + return (__m256d) __builtin_ia32_addsubpd256 ((__v4df)__A, (__v4df)__B); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_addsub_ps (__m256 __A, __m256 __B) +{ + return (__m256) __builtin_ia32_addsubps256 ((__v8sf)__A, (__v8sf)__B); +} + + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_and_pd (__m256d __A, __m256d __B) +{ + return (__m256d) __builtin_ia32_andpd256 ((__v4df)__A, (__v4df)__B); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_and_ps (__m256 __A, __m256 __B) +{ + return (__m256) __builtin_ia32_andps256 ((__v8sf)__A, (__v8sf)__B); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_andnot_pd (__m256d __A, __m256d __B) +{ + return (__m256d) __builtin_ia32_andnpd256 ((__v4df)__A, (__v4df)__B); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_andnot_ps (__m256 __A, __m256 __B) +{ + return (__m256) __builtin_ia32_andnps256 ((__v8sf)__A, (__v8sf)__B); +} + +/* Double/single precision floating point blend instructions - select + data from 2 sources using constant/variable mask. */ + +#ifdef __OPTIMIZE__ +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_blend_pd (__m256d __X, __m256d __Y, const int __M) +{ + return (__m256d) __builtin_ia32_blendpd256 ((__v4df)__X, + (__v4df)__Y, + __M); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_blend_ps (__m256 __X, __m256 __Y, const int __M) +{ + return (__m256) __builtin_ia32_blendps256 ((__v8sf)__X, + (__v8sf)__Y, + __M); +} +#else +#define _mm256_blend_pd(X, Y, M) \ + ((__m256d) __builtin_ia32_blendpd256 ((__v4df)(__m256d)(X), \ + (__v4df)(__m256d)(Y), (int)(M))) + +#define _mm256_blend_ps(X, Y, M) \ + ((__m256) __builtin_ia32_blendps256 ((__v8sf)(__m256)(X), \ + (__v8sf)(__m256)(Y), (int)(M))) +#endif + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_blendv_pd (__m256d __X, __m256d __Y, __m256d __M) +{ + return (__m256d) __builtin_ia32_blendvpd256 ((__v4df)__X, + (__v4df)__Y, + (__v4df)__M); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_blendv_ps (__m256 __X, __m256 __Y, __m256 __M) +{ + return (__m256) __builtin_ia32_blendvps256 ((__v8sf)__X, + (__v8sf)__Y, + (__v8sf)__M); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_div_pd (__m256d __A, __m256d __B) +{ + return (__m256d) __builtin_ia32_divpd256 ((__v4df)__A, (__v4df)__B); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_div_ps (__m256 __A, __m256 __B) +{ + return (__m256) __builtin_ia32_divps256 ((__v8sf)__A, (__v8sf)__B); +} + +/* Dot product instructions with mask-defined summing and zeroing parts + of result. */ + +#ifdef __OPTIMIZE__ +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_dp_ps (__m256 __X, __m256 __Y, const int __M) +{ + return (__m256) __builtin_ia32_dpps256 ((__v8sf)__X, + (__v8sf)__Y, + __M); +} +#else +#define _mm256_dp_ps(X, Y, M) \ + ((__m256) __builtin_ia32_dpps256 ((__v8sf)(__m256)(X), \ + (__v8sf)(__m256)(Y), (int)(M))) +#endif + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_hadd_pd (__m256d __X, __m256d __Y) +{ + return (__m256d) __builtin_ia32_haddpd256 ((__v4df)__X, (__v4df)__Y); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_hadd_ps (__m256 __X, __m256 __Y) +{ + return (__m256) __builtin_ia32_haddps256 ((__v8sf)__X, (__v8sf)__Y); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_hsub_pd (__m256d __X, __m256d __Y) +{ + return (__m256d) __builtin_ia32_hsubpd256 ((__v4df)__X, (__v4df)__Y); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_hsub_ps (__m256 __X, __m256 __Y) +{ + return (__m256) __builtin_ia32_hsubps256 ((__v8sf)__X, (__v8sf)__Y); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_max_pd (__m256d __A, __m256d __B) +{ + return (__m256d) __builtin_ia32_maxpd256 ((__v4df)__A, (__v4df)__B); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_max_ps (__m256 __A, __m256 __B) +{ + return (__m256) __builtin_ia32_maxps256 ((__v8sf)__A, (__v8sf)__B); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_min_pd (__m256d __A, __m256d __B) +{ + return (__m256d) __builtin_ia32_minpd256 ((__v4df)__A, (__v4df)__B); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_min_ps (__m256 __A, __m256 __B) +{ + return (__m256) __builtin_ia32_minps256 ((__v8sf)__A, (__v8sf)__B); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mul_pd (__m256d __A, __m256d __B) +{ + return (__m256d) __builtin_ia32_mulpd256 ((__v4df)__A, (__v4df)__B); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mul_ps (__m256 __A, __m256 __B) +{ + return (__m256) __builtin_ia32_mulps256 ((__v8sf)__A, (__v8sf)__B); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_or_pd (__m256d __A, __m256d __B) +{ + return (__m256d) __builtin_ia32_orpd256 ((__v4df)__A, (__v4df)__B); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_or_ps (__m256 __A, __m256 __B) +{ + return (__m256) __builtin_ia32_orps256 ((__v8sf)__A, (__v8sf)__B); +} + +#ifdef __OPTIMIZE__ +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_shuffle_pd (__m256d __A, __m256d __B, const int __mask) +{ + return (__m256d) __builtin_ia32_shufpd256 ((__v4df)__A, (__v4df)__B, + __mask); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_shuffle_ps (__m256 __A, __m256 __B, const int __mask) +{ + return (__m256) __builtin_ia32_shufps256 ((__v8sf)__A, (__v8sf)__B, + __mask); +} +#else +#define _mm256_shuffle_pd(A, B, N) \ + ((__m256d)__builtin_ia32_shufpd256 ((__v4df)(__m256d)(A), \ + (__v4df)(__m256d)(B), (int)(N))) + +#define _mm256_shuffle_ps(A, B, N) \ + ((__m256) __builtin_ia32_shufps256 ((__v8sf)(__m256)(A), \ + (__v8sf)(__m256)(B), (int)(N))) +#endif + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sub_pd (__m256d __A, __m256d __B) +{ + return (__m256d) __builtin_ia32_subpd256 ((__v4df)__A, (__v4df)__B); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sub_ps (__m256 __A, __m256 __B) +{ + return (__m256) __builtin_ia32_subps256 ((__v8sf)__A, (__v8sf)__B); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_xor_pd (__m256d __A, __m256d __B) +{ + return (__m256d) __builtin_ia32_xorpd256 ((__v4df)__A, (__v4df)__B); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_xor_ps (__m256 __A, __m256 __B) +{ + return (__m256) __builtin_ia32_xorps256 ((__v8sf)__A, (__v8sf)__B); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmp_pd (__m128d __X, __m128d __Y, const int __P) +{ + return (__m128d) __builtin_ia32_cmppd ((__v2df)__X, (__v2df)__Y, __P); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmp_ps (__m128 __X, __m128 __Y, const int __P) +{ + return (__m128) __builtin_ia32_cmpps ((__v4sf)__X, (__v4sf)__Y, __P); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cmp_pd (__m256d __X, __m256d __Y, const int __P) +{ + return (__m256d) __builtin_ia32_cmppd256 ((__v4df)__X, (__v4df)__Y, + __P); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cmp_ps (__m256 __X, __m256 __Y, const int __P) +{ + return (__m256) __builtin_ia32_cmpps256 ((__v8sf)__X, (__v8sf)__Y, + __P); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmp_sd (__m128d __X, __m128d __Y, const int __P) +{ + return (__m128d) __builtin_ia32_cmpsd ((__v2df)__X, (__v2df)__Y, __P); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmp_ss (__m128 __X, __m128 __Y, const int __P) +{ + return (__m128) __builtin_ia32_cmpss ((__v4sf)__X, (__v4sf)__Y, __P); +} +#else +#define _mm_cmp_pd(X, Y, P) \ + ((__m128d) __builtin_ia32_cmppd ((__v2df)(__m128d)(X), \ + (__v2df)(__m128d)(Y), (int)(P))) + +#define _mm_cmp_ps(X, Y, P) \ + ((__m128) __builtin_ia32_cmpps ((__v4sf)(__m128)(X), \ + (__v4sf)(__m128)(Y), (int)(P))) + +#define _mm256_cmp_pd(X, Y, P) \ + ((__m256d) __builtin_ia32_cmppd256 ((__v4df)(__m256d)(X), \ + (__v4df)(__m256d)(Y), (int)(P))) + +#define _mm256_cmp_ps(X, Y, P) \ + ((__m256) __builtin_ia32_cmpps256 ((__v8sf)(__m256)(X), \ + (__v8sf)(__m256)(Y), (int)(P))) + +#define _mm_cmp_sd(X, Y, P) \ + ((__m128d) __builtin_ia32_cmpsd ((__v2df)(__m128d)(X), \ + (__v2df)(__m128d)(Y), (int)(P))) + +#define _mm_cmp_ss(X, Y, P) \ + ((__m128) __builtin_ia32_cmpss ((__v4sf)(__m128)(X), \ + (__v4sf)(__m128)(Y), (int)(P))) +#endif + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtepi32_pd (__m128i __A) +{ + return (__m256d)__builtin_ia32_cvtdq2pd256 ((__v4si) __A); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtepi32_ps (__m256i __A) +{ + return (__m256)__builtin_ia32_cvtdq2ps256 ((__v8si) __A); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtpd_ps (__m256d __A) +{ + return (__m128)__builtin_ia32_cvtpd2ps256 ((__v4df) __A); +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtps_epi32 (__m256 __A) +{ + return (__m256i)__builtin_ia32_cvtps2dq256 ((__v8sf) __A); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtps_pd (__m128 __A) +{ + return (__m256d)__builtin_ia32_cvtps2pd256 ((__v4sf) __A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvttpd_epi32 (__m256d __A) +{ + return (__m128i)__builtin_ia32_cvttpd2dq256 ((__v4df) __A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtpd_epi32 (__m256d __A) +{ + return (__m128i)__builtin_ia32_cvtpd2dq256 ((__v4df) __A); +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvttps_epi32 (__m256 __A) +{ + return (__m256i)__builtin_ia32_cvttps2dq256 ((__v8sf) __A); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_extractf128_pd (__m256d __X, const int __N) +{ + return (__m128d) __builtin_ia32_vextractf128_pd256 ((__v4df)__X, __N); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_extractf128_ps (__m256 __X, const int __N) +{ + return (__m128) __builtin_ia32_vextractf128_ps256 ((__v8sf)__X, __N); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_extractf128_si256 (__m256i __X, const int __N) +{ + return (__m128i) __builtin_ia32_vextractf128_si256 ((__v8si)__X, __N); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_extract_epi32 (__m256i __X, int const __N) +{ + __m128i __Y = _mm256_extractf128_si256 (__X, __N >> 2); + return _mm_extract_epi32 (__Y, __N % 4); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_extract_epi16 (__m256i __X, int const __N) +{ + __m128i __Y = _mm256_extractf128_si256 (__X, __N >> 3); + return _mm_extract_epi16 (__Y, __N % 8); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_extract_epi8 (__m256i __X, int const __N) +{ + __m128i __Y = _mm256_extractf128_si256 (__X, __N >> 4); + return _mm_extract_epi8 (__Y, __N % 16); +} + +#ifdef __x86_64__ +extern __inline long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_extract_epi64 (__m256i __X, const int __N) +{ + __m128i __Y = _mm256_extractf128_si256 (__X, __N >> 1); + return _mm_extract_epi64 (__Y, __N % 2); +} +#endif +#else +#define _mm256_extractf128_pd(X, N) \ + ((__m128d) __builtin_ia32_vextractf128_pd256 ((__v4df)(__m256d)(X), \ + (int)(N))) + +#define _mm256_extractf128_ps(X, N) \ + ((__m128) __builtin_ia32_vextractf128_ps256 ((__v8sf)(__m256)(X), \ + (int)(N))) + +#define _mm256_extractf128_si256(X, N) \ + ((__m128i) __builtin_ia32_vextractf128_si256 ((__v8si)(__m256i)(X), \ + (int)(N))) + +#define _mm256_extract_epi32(X, N) \ + (__extension__ \ + ({ \ + __m128i __Y = _mm256_extractf128_si256 ((X), (N) >> 2); \ + _mm_extract_epi32 (__Y, (N) % 4); \ + })) + +#define _mm256_extract_epi16(X, N) \ + (__extension__ \ + ({ \ + __m128i __Y = _mm256_extractf128_si256 ((X), (N) >> 3); \ + _mm_extract_epi16 (__Y, (N) % 8); \ + })) + +#define _mm256_extract_epi8(X, N) \ + (__extension__ \ + ({ \ + __m128i __Y = _mm256_extractf128_si256 ((X), (N) >> 4); \ + _mm_extract_epi8 (__Y, (N) % 16); \ + })) + +#ifdef __x86_64__ +#define _mm256_extract_epi64(X, N) \ + (__extension__ \ + ({ \ + __m128i __Y = _mm256_extractf128_si256 ((X), (N) >> 1); \ + _mm_extract_epi64 (__Y, (N) % 2); \ + })) +#endif +#endif + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_zeroall (void) +{ + __builtin_ia32_vzeroall (); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_zeroupper (void) +{ + __builtin_ia32_vzeroupper (); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_permutevar_pd (__m128d __A, __m128i __C) +{ + return (__m128d) __builtin_ia32_vpermilvarpd ((__v2df)__A, + (__v2di)__C); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permutevar_pd (__m256d __A, __m256i __C) +{ + return (__m256d) __builtin_ia32_vpermilvarpd256 ((__v4df)__A, + (__v4di)__C); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_permutevar_ps (__m128 __A, __m128i __C) +{ + return (__m128) __builtin_ia32_vpermilvarps ((__v4sf)__A, + (__v4si)__C); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permutevar_ps (__m256 __A, __m256i __C) +{ + return (__m256) __builtin_ia32_vpermilvarps256 ((__v8sf)__A, + (__v8si)__C); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_permute_pd (__m128d __X, const int __C) +{ + return (__m128d) __builtin_ia32_vpermilpd ((__v2df)__X, __C); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permute_pd (__m256d __X, const int __C) +{ + return (__m256d) __builtin_ia32_vpermilpd256 ((__v4df)__X, __C); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_permute_ps (__m128 __X, const int __C) +{ + return (__m128) __builtin_ia32_vpermilps ((__v4sf)__X, __C); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permute_ps (__m256 __X, const int __C) +{ + return (__m256) __builtin_ia32_vpermilps256 ((__v8sf)__X, __C); +} +#else +#define _mm_permute_pd(X, C) \ + ((__m128d) __builtin_ia32_vpermilpd ((__v2df)(__m128d)(X), (int)(C))) + +#define _mm256_permute_pd(X, C) \ + ((__m256d) __builtin_ia32_vpermilpd256 ((__v4df)(__m256d)(X), (int)(C))) + +#define _mm_permute_ps(X, C) \ + ((__m128) __builtin_ia32_vpermilps ((__v4sf)(__m128)(X), (int)(C))) + +#define _mm256_permute_ps(X, C) \ + ((__m256) __builtin_ia32_vpermilps256 ((__v8sf)(__m256)(X), (int)(C))) +#endif + +#ifdef __OPTIMIZE__ +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permute2f128_pd (__m256d __X, __m256d __Y, const int __C) +{ + return (__m256d) __builtin_ia32_vperm2f128_pd256 ((__v4df)__X, + (__v4df)__Y, + __C); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permute2f128_ps (__m256 __X, __m256 __Y, const int __C) +{ + return (__m256) __builtin_ia32_vperm2f128_ps256 ((__v8sf)__X, + (__v8sf)__Y, + __C); +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permute2f128_si256 (__m256i __X, __m256i __Y, const int __C) +{ + return (__m256i) __builtin_ia32_vperm2f128_si256 ((__v8si)__X, + (__v8si)__Y, + __C); +} +#else +#define _mm256_permute2f128_pd(X, Y, C) \ + ((__m256d) __builtin_ia32_vperm2f128_pd256 ((__v4df)(__m256d)(X), \ + (__v4df)(__m256d)(Y), \ + (int)(C))) + +#define _mm256_permute2f128_ps(X, Y, C) \ + ((__m256) __builtin_ia32_vperm2f128_ps256 ((__v8sf)(__m256)(X), \ + (__v8sf)(__m256)(Y), \ + (int)(C))) + +#define _mm256_permute2f128_si256(X, Y, C) \ + ((__m256i) __builtin_ia32_vperm2f128_si256 ((__v8si)(__m256i)(X), \ + (__v8si)(__m256i)(Y), \ + (int)(C))) +#endif + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_broadcast_ss (float const *__X) +{ + return (__m128) __builtin_ia32_vbroadcastss (__X); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_broadcast_sd (double const *__X) +{ + return (__m256d) __builtin_ia32_vbroadcastsd256 (__X); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_broadcast_ss (float const *__X) +{ + return (__m256) __builtin_ia32_vbroadcastss256 (__X); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_broadcast_pd (__m128d const *__X) +{ + return (__m256d) __builtin_ia32_vbroadcastf128_pd256 (__X); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_broadcast_ps (__m128 const *__X) +{ + return (__m256) __builtin_ia32_vbroadcastf128_ps256 (__X); +} + +#ifdef __OPTIMIZE__ +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_insertf128_pd (__m256d __X, __m128d __Y, const int __O) +{ + return (__m256d) __builtin_ia32_vinsertf128_pd256 ((__v4df)__X, + (__v2df)__Y, + __O); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_insertf128_ps (__m256 __X, __m128 __Y, const int __O) +{ + return (__m256) __builtin_ia32_vinsertf128_ps256 ((__v8sf)__X, + (__v4sf)__Y, + __O); +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_insertf128_si256 (__m256i __X, __m128i __Y, const int __O) +{ + return (__m256i) __builtin_ia32_vinsertf128_si256 ((__v8si)__X, + (__v4si)__Y, + __O); +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_insert_epi32 (__m256i __X, int __D, int const __N) +{ + __m128i __Y = _mm256_extractf128_si256 (__X, __N >> 2); + __Y = _mm_insert_epi32 (__Y, __D, __N % 4); + return _mm256_insertf128_si256 (__X, __Y, __N >> 2); +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_insert_epi16 (__m256i __X, int __D, int const __N) +{ + __m128i __Y = _mm256_extractf128_si256 (__X, __N >> 3); + __Y = _mm_insert_epi16 (__Y, __D, __N % 8); + return _mm256_insertf128_si256 (__X, __Y, __N >> 3); +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_insert_epi8 (__m256i __X, int __D, int const __N) +{ + __m128i __Y = _mm256_extractf128_si256 (__X, __N >> 4); + __Y = _mm_insert_epi8 (__Y, __D, __N % 16); + return _mm256_insertf128_si256 (__X, __Y, __N >> 4); +} + +#ifdef __x86_64__ +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_insert_epi64 (__m256i __X, long long __D, int const __N) +{ + __m128i __Y = _mm256_extractf128_si256 (__X, __N >> 1); + __Y = _mm_insert_epi64 (__Y, __D, __N % 2); + return _mm256_insertf128_si256 (__X, __Y, __N >> 1); +} +#endif +#else +#define _mm256_insertf128_pd(X, Y, O) \ + ((__m256d) __builtin_ia32_vinsertf128_pd256 ((__v4df)(__m256d)(X), \ + (__v2df)(__m128d)(Y), \ + (int)(O))) + +#define _mm256_insertf128_ps(X, Y, O) \ + ((__m256) __builtin_ia32_vinsertf128_ps256 ((__v8sf)(__m256)(X), \ + (__v4sf)(__m128)(Y), \ + (int)(O))) + +#define _mm256_insertf128_si256(X, Y, O) \ + ((__m256i) __builtin_ia32_vinsertf128_si256 ((__v8si)(__m256i)(X), \ + (__v4si)(__m128i)(Y), \ + (int)(O))) + +#define _mm256_insert_epi32(X, D, N) \ + (__extension__ \ + ({ \ + __m128i __Y = _mm256_extractf128_si256 ((X), (N) >> 2); \ + __Y = _mm_insert_epi32 (__Y, (D), (N) % 4); \ + _mm256_insertf128_si256 ((X), __Y, (N) >> 2); \ + })) + +#define _mm256_insert_epi16(X, D, N) \ + (__extension__ \ + ({ \ + __m128i __Y = _mm256_extractf128_si256 ((X), (N) >> 3); \ + __Y = _mm_insert_epi16 (__Y, (D), (N) % 8); \ + _mm256_insertf128_si256 ((X), __Y, (N) >> 3); \ + })) + +#define _mm256_insert_epi8(X, D, N) \ + (__extension__ \ + ({ \ + __m128i __Y = _mm256_extractf128_si256 ((X), (N) >> 4); \ + __Y = _mm_insert_epi8 (__Y, (D), (N) % 16); \ + _mm256_insertf128_si256 ((X), __Y, (N) >> 4); \ + })) + +#ifdef __x86_64__ +#define _mm256_insert_epi64(X, D, N) \ + (__extension__ \ + ({ \ + __m128i __Y = _mm256_extractf128_si256 ((X), (N) >> 1); \ + __Y = _mm_insert_epi64 (__Y, (D), (N) % 2); \ + _mm256_insertf128_si256 ((X), __Y, (N) >> 1); \ + })) +#endif +#endif + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_load_pd (double const *__P) +{ + return *(__m256d *)__P; +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_store_pd (double *__P, __m256d __A) +{ + *(__m256d *)__P = __A; +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_load_ps (float const *__P) +{ + return *(__m256 *)__P; +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_store_ps (float *__P, __m256 __A) +{ + *(__m256 *)__P = __A; +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_loadu_pd (double const *__P) +{ + return (__m256d) __builtin_ia32_loadupd256 (__P); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_storeu_pd (double *__P, __m256d __A) +{ + __builtin_ia32_storeupd256 (__P, (__v4df)__A); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_loadu_ps (float const *__P) +{ + return (__m256) __builtin_ia32_loadups256 (__P); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_storeu_ps (float *__P, __m256 __A) +{ + __builtin_ia32_storeups256 (__P, (__v8sf)__A); +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_load_si256 (__m256i const *__P) +{ + return *__P; +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_store_si256 (__m256i *__P, __m256i __A) +{ + *__P = __A; +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_loadu_si256 (__m256i const *__P) +{ + return (__m256i) __builtin_ia32_loaddqu256 ((char const *)__P); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_storeu_si256 (__m256i *__P, __m256i __A) +{ + __builtin_ia32_storedqu256 ((char *)__P, (__v32qi)__A); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskload_pd (double const *__P, __m128i __M) +{ + return (__m128d) __builtin_ia32_maskloadpd ((const __v2df *)__P, + (__v2di)__M); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskstore_pd (double *__P, __m128i __M, __m128d __A) +{ + __builtin_ia32_maskstorepd ((__v2df *)__P, (__v2di)__M, (__v2df)__A); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskload_pd (double const *__P, __m256i __M) +{ + return (__m256d) __builtin_ia32_maskloadpd256 ((const __v4df *)__P, + (__v4di)__M); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskstore_pd (double *__P, __m256i __M, __m256d __A) +{ + __builtin_ia32_maskstorepd256 ((__v4df *)__P, (__v4di)__M, (__v4df)__A); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskload_ps (float const *__P, __m128i __M) +{ + return (__m128) __builtin_ia32_maskloadps ((const __v4sf *)__P, + (__v4si)__M); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskstore_ps (float *__P, __m128i __M, __m128 __A) +{ + __builtin_ia32_maskstoreps ((__v4sf *)__P, (__v4si)__M, (__v4sf)__A); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskload_ps (float const *__P, __m256i __M) +{ + return (__m256) __builtin_ia32_maskloadps256 ((const __v8sf *)__P, + (__v8si)__M); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maskstore_ps (float *__P, __m256i __M, __m256 __A) +{ + __builtin_ia32_maskstoreps256 ((__v8sf *)__P, (__v8si)__M, (__v8sf)__A); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_movehdup_ps (__m256 __X) +{ + return (__m256) __builtin_ia32_movshdup256 ((__v8sf)__X); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_moveldup_ps (__m256 __X) +{ + return (__m256) __builtin_ia32_movsldup256 ((__v8sf)__X); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_movedup_pd (__m256d __X) +{ + return (__m256d) __builtin_ia32_movddup256 ((__v4df)__X); +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_lddqu_si256 (__m256i const *__P) +{ + return (__m256i) __builtin_ia32_lddqu256 ((char const *)__P); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_stream_si256 (__m256i *__A, __m256i __B) +{ + __builtin_ia32_movntdq256 ((__v4di *)__A, (__v4di)__B); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_stream_pd (double *__A, __m256d __B) +{ + __builtin_ia32_movntpd256 (__A, (__v4df)__B); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_stream_ps (float *__P, __m256 __A) +{ + __builtin_ia32_movntps256 (__P, (__v8sf)__A); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_rcp_ps (__m256 __A) +{ + return (__m256) __builtin_ia32_rcpps256 ((__v8sf)__A); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_rsqrt_ps (__m256 __A) +{ + return (__m256) __builtin_ia32_rsqrtps256 ((__v8sf)__A); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sqrt_pd (__m256d __A) +{ + return (__m256d) __builtin_ia32_sqrtpd256 ((__v4df)__A); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_sqrt_ps (__m256 __A) +{ + return (__m256) __builtin_ia32_sqrtps256 ((__v8sf)__A); +} + +#ifdef __OPTIMIZE__ +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_round_pd (__m256d __V, const int __M) +{ + return (__m256d) __builtin_ia32_roundpd256 ((__v4df)__V, __M); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_round_ps (__m256 __V, const int __M) +{ + return (__m256) __builtin_ia32_roundps256 ((__v8sf)__V, __M); +} +#else +#define _mm256_round_pd(V, M) \ + ((__m256d) __builtin_ia32_roundpd256 ((__v4df)(__m256d)(V), (int)(M))) + +#define _mm256_round_ps(V, M) \ + ((__m256) __builtin_ia32_roundps256 ((__v8sf)(__m256)(V), (int)(M))) +#endif + +#define _mm256_ceil_pd(V) _mm256_round_pd ((V), _MM_FROUND_CEIL) +#define _mm256_floor_pd(V) _mm256_round_pd ((V), _MM_FROUND_FLOOR) +#define _mm256_ceil_ps(V) _mm256_round_ps ((V), _MM_FROUND_CEIL) +#define _mm256_floor_ps(V) _mm256_round_ps ((V), _MM_FROUND_FLOOR) + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_unpackhi_pd (__m256d __A, __m256d __B) +{ + return (__m256d) __builtin_ia32_unpckhpd256 ((__v4df)__A, (__v4df)__B); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_unpacklo_pd (__m256d __A, __m256d __B) +{ + return (__m256d) __builtin_ia32_unpcklpd256 ((__v4df)__A, (__v4df)__B); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_unpackhi_ps (__m256 __A, __m256 __B) +{ + return (__m256) __builtin_ia32_unpckhps256 ((__v8sf)__A, (__v8sf)__B); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_unpacklo_ps (__m256 __A, __m256 __B) +{ + return (__m256) __builtin_ia32_unpcklps256 ((__v8sf)__A, (__v8sf)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_testz_pd (__m128d __M, __m128d __V) +{ + return __builtin_ia32_vtestzpd ((__v2df)__M, (__v2df)__V); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_testc_pd (__m128d __M, __m128d __V) +{ + return __builtin_ia32_vtestcpd ((__v2df)__M, (__v2df)__V); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_testnzc_pd (__m128d __M, __m128d __V) +{ + return __builtin_ia32_vtestnzcpd ((__v2df)__M, (__v2df)__V); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_testz_ps (__m128 __M, __m128 __V) +{ + return __builtin_ia32_vtestzps ((__v4sf)__M, (__v4sf)__V); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_testc_ps (__m128 __M, __m128 __V) +{ + return __builtin_ia32_vtestcps ((__v4sf)__M, (__v4sf)__V); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_testnzc_ps (__m128 __M, __m128 __V) +{ + return __builtin_ia32_vtestnzcps ((__v4sf)__M, (__v4sf)__V); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_testz_pd (__m256d __M, __m256d __V) +{ + return __builtin_ia32_vtestzpd256 ((__v4df)__M, (__v4df)__V); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_testc_pd (__m256d __M, __m256d __V) +{ + return __builtin_ia32_vtestcpd256 ((__v4df)__M, (__v4df)__V); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_testnzc_pd (__m256d __M, __m256d __V) +{ + return __builtin_ia32_vtestnzcpd256 ((__v4df)__M, (__v4df)__V); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_testz_ps (__m256 __M, __m256 __V) +{ + return __builtin_ia32_vtestzps256 ((__v8sf)__M, (__v8sf)__V); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_testc_ps (__m256 __M, __m256 __V) +{ + return __builtin_ia32_vtestcps256 ((__v8sf)__M, (__v8sf)__V); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_testnzc_ps (__m256 __M, __m256 __V) +{ + return __builtin_ia32_vtestnzcps256 ((__v8sf)__M, (__v8sf)__V); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_testz_si256 (__m256i __M, __m256i __V) +{ + return __builtin_ia32_ptestz256 ((__v4di)__M, (__v4di)__V); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_testc_si256 (__m256i __M, __m256i __V) +{ + return __builtin_ia32_ptestc256 ((__v4di)__M, (__v4di)__V); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_testnzc_si256 (__m256i __M, __m256i __V) +{ + return __builtin_ia32_ptestnzc256 ((__v4di)__M, (__v4di)__V); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_movemask_pd (__m256d __A) +{ + return __builtin_ia32_movmskpd256 ((__v4df)__A); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_movemask_ps (__m256 __A) +{ + return __builtin_ia32_movmskps256 ((__v8sf)__A); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_undefined_pd (void) +{ + __m256d __Y = __Y; + return __Y; +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_undefined_ps (void) +{ + __m256 __Y = __Y; + return __Y; +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_undefined_si256 (void) +{ + __m256i __Y = __Y; + return __Y; +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_setzero_pd (void) +{ + return __extension__ (__m256d){ 0.0, 0.0, 0.0, 0.0 }; +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_setzero_ps (void) +{ + return __extension__ (__m256){ 0.0, 0.0, 0.0, 0.0, + 0.0, 0.0, 0.0, 0.0 }; +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_setzero_si256 (void) +{ + return __extension__ (__m256i)(__v4di){ 0, 0, 0, 0 }; +} + +/* Create the vector [A B C D]. */ +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_set_pd (double __A, double __B, double __C, double __D) +{ + return __extension__ (__m256d){ __D, __C, __B, __A }; +} + +/* Create the vector [A B C D E F G H]. */ +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_set_ps (float __A, float __B, float __C, float __D, + float __E, float __F, float __G, float __H) +{ + return __extension__ (__m256){ __H, __G, __F, __E, + __D, __C, __B, __A }; +} + +/* Create the vector [A B C D E F G H]. */ +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_set_epi32 (int __A, int __B, int __C, int __D, + int __E, int __F, int __G, int __H) +{ + return __extension__ (__m256i)(__v8si){ __H, __G, __F, __E, + __D, __C, __B, __A }; +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_set_epi16 (short __q15, short __q14, short __q13, short __q12, + short __q11, short __q10, short __q09, short __q08, + short __q07, short __q06, short __q05, short __q04, + short __q03, short __q02, short __q01, short __q00) +{ + return __extension__ (__m256i)(__v16hi){ + __q00, __q01, __q02, __q03, __q04, __q05, __q06, __q07, + __q08, __q09, __q10, __q11, __q12, __q13, __q14, __q15 + }; +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_set_epi8 (char __q31, char __q30, char __q29, char __q28, + char __q27, char __q26, char __q25, char __q24, + char __q23, char __q22, char __q21, char __q20, + char __q19, char __q18, char __q17, char __q16, + char __q15, char __q14, char __q13, char __q12, + char __q11, char __q10, char __q09, char __q08, + char __q07, char __q06, char __q05, char __q04, + char __q03, char __q02, char __q01, char __q00) +{ + return __extension__ (__m256i)(__v32qi){ + __q00, __q01, __q02, __q03, __q04, __q05, __q06, __q07, + __q08, __q09, __q10, __q11, __q12, __q13, __q14, __q15, + __q16, __q17, __q18, __q19, __q20, __q21, __q22, __q23, + __q24, __q25, __q26, __q27, __q28, __q29, __q30, __q31 + }; +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_set_epi64x (long long __A, long long __B, long long __C, + long long __D) +{ + return __extension__ (__m256i)(__v4di){ __D, __C, __B, __A }; +} + +/* Create a vector with all elements equal to A. */ +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_set1_pd (double __A) +{ + return __extension__ (__m256d){ __A, __A, __A, __A }; +} + +/* Create a vector with all elements equal to A. */ +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_set1_ps (float __A) +{ + return __extension__ (__m256){ __A, __A, __A, __A, + __A, __A, __A, __A }; +} + +/* Create a vector with all elements equal to A. */ +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_set1_epi32 (int __A) +{ + return __extension__ (__m256i)(__v8si){ __A, __A, __A, __A, + __A, __A, __A, __A }; +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_set1_epi16 (short __A) +{ + return _mm256_set_epi16 (__A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A); +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_set1_epi8 (char __A) +{ + return _mm256_set_epi8 (__A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A); +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_set1_epi64x (long long __A) +{ + return __extension__ (__m256i)(__v4di){ __A, __A, __A, __A }; +} + +/* Create vectors of elements in the reversed order from the + _mm256_set_XXX functions. */ + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_setr_pd (double __A, double __B, double __C, double __D) +{ + return _mm256_set_pd (__D, __C, __B, __A); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_setr_ps (float __A, float __B, float __C, float __D, + float __E, float __F, float __G, float __H) +{ + return _mm256_set_ps (__H, __G, __F, __E, __D, __C, __B, __A); +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_setr_epi32 (int __A, int __B, int __C, int __D, + int __E, int __F, int __G, int __H) +{ + return _mm256_set_epi32 (__H, __G, __F, __E, __D, __C, __B, __A); +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_setr_epi16 (short __q15, short __q14, short __q13, short __q12, + short __q11, short __q10, short __q09, short __q08, + short __q07, short __q06, short __q05, short __q04, + short __q03, short __q02, short __q01, short __q00) +{ + return _mm256_set_epi16 (__q00, __q01, __q02, __q03, + __q04, __q05, __q06, __q07, + __q08, __q09, __q10, __q11, + __q12, __q13, __q14, __q15); +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_setr_epi8 (char __q31, char __q30, char __q29, char __q28, + char __q27, char __q26, char __q25, char __q24, + char __q23, char __q22, char __q21, char __q20, + char __q19, char __q18, char __q17, char __q16, + char __q15, char __q14, char __q13, char __q12, + char __q11, char __q10, char __q09, char __q08, + char __q07, char __q06, char __q05, char __q04, + char __q03, char __q02, char __q01, char __q00) +{ + return _mm256_set_epi8 (__q00, __q01, __q02, __q03, + __q04, __q05, __q06, __q07, + __q08, __q09, __q10, __q11, + __q12, __q13, __q14, __q15, + __q16, __q17, __q18, __q19, + __q20, __q21, __q22, __q23, + __q24, __q25, __q26, __q27, + __q28, __q29, __q30, __q31); +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_setr_epi64x (long long __A, long long __B, long long __C, + long long __D) +{ + return _mm256_set_epi64x (__D, __C, __B, __A); +} + +/* Casts between various SP, DP, INT vector types. Note that these do no + conversion of values, they just change the type. */ +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castpd_ps (__m256d __A) +{ + return (__m256) __A; +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castpd_si256 (__m256d __A) +{ + return (__m256i) __A; +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castps_pd (__m256 __A) +{ + return (__m256d) __A; +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castps_si256(__m256 __A) +{ + return (__m256i) __A; +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castsi256_ps (__m256i __A) +{ + return (__m256) __A; +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castsi256_pd (__m256i __A) +{ + return (__m256d) __A; +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castpd256_pd128 (__m256d __A) +{ + return (__m128d) __builtin_ia32_pd_pd256 ((__v4df)__A); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castps256_ps128 (__m256 __A) +{ + return (__m128) __builtin_ia32_ps_ps256 ((__v8sf)__A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castsi256_si128 (__m256i __A) +{ + return (__m128i) __builtin_ia32_si_si256 ((__v8si)__A); +} + +/* When cast is done from a 128 to 256-bit type, the low 128 bits of + the 256-bit result contain source parameter value and the upper 128 + bits of the result are undefined. Those intrinsics shouldn't + generate any extra moves. */ + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castpd128_pd256 (__m128d __A) +{ + return (__m256d) __builtin_ia32_pd256_pd ((__v2df)__A); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castps128_ps256 (__m128 __A) +{ + return (__m256) __builtin_ia32_ps256_ps ((__v4sf)__A); +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castsi128_si256 (__m128i __A) +{ + return (__m256i) __builtin_ia32_si256_si ((__v4si)__A); +} + +#ifdef __DISABLE_AVX__ +#undef __DISABLE_AVX__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX__ */ + +#endif /* _AVXINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/bmi2intrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/bmi2intrin.h new file mode 100644 index 0000000..ff96296 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/bmi2intrin.h @@ -0,0 +1,109 @@ +/* Copyright (C) 2011-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#if !defined _X86INTRIN_H_INCLUDED && !defined _IMMINTRIN_H_INCLUDED +# error "Never use directly; include instead." +#endif + +#ifndef _BMI2INTRIN_H_INCLUDED +#define _BMI2INTRIN_H_INCLUDED + +#ifndef __BMI2__ +#pragma GCC push_options +#pragma GCC target("bmi2") +#define __DISABLE_BMI2__ +#endif /* __BMI2__ */ + +extern __inline unsigned int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_bzhi_u32 (unsigned int __X, unsigned int __Y) +{ + return __builtin_ia32_bzhi_si (__X, __Y); +} + +extern __inline unsigned int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_pdep_u32 (unsigned int __X, unsigned int __Y) +{ + return __builtin_ia32_pdep_si (__X, __Y); +} + +extern __inline unsigned int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_pext_u32 (unsigned int __X, unsigned int __Y) +{ + return __builtin_ia32_pext_si (__X, __Y); +} + +#ifdef __x86_64__ + +extern __inline unsigned long long +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_bzhi_u64 (unsigned long long __X, unsigned long long __Y) +{ + return __builtin_ia32_bzhi_di (__X, __Y); +} + +extern __inline unsigned long long +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_pdep_u64 (unsigned long long __X, unsigned long long __Y) +{ + return __builtin_ia32_pdep_di (__X, __Y); +} + +extern __inline unsigned long long +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_pext_u64 (unsigned long long __X, unsigned long long __Y) +{ + return __builtin_ia32_pext_di (__X, __Y); +} + +extern __inline unsigned long long +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mulx_u64 (unsigned long long __X, unsigned long long __Y, + unsigned long long *__P) +{ + unsigned __int128 __res = (unsigned __int128) __X * __Y; + *__P = (unsigned long long) (__res >> 64); + return (unsigned long long) __res; +} + +#else /* !__x86_64__ */ + +extern __inline unsigned int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mulx_u32 (unsigned int __X, unsigned int __Y, unsigned int *__P) +{ + unsigned long long __res = (unsigned long long) __X * __Y; + *__P = (unsigned int) (__res >> 32); + return (unsigned int) __res; +} + +#endif /* !__x86_64__ */ + +#ifdef __DISABLE_BMI2__ +#undef __DISABLE_BMI2__ +#pragma GCC pop_options +#endif /* __DISABLE_BMI2__ */ + +#endif /* _BMI2INTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/bmiintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/bmiintrin.h new file mode 100644 index 0000000..b2d7c60 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/bmiintrin.h @@ -0,0 +1,184 @@ +/* Copyright (C) 2010-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#if !defined _X86INTRIN_H_INCLUDED && !defined _IMMINTRIN_H_INCLUDED +# error "Never use directly; include instead." +#endif + +#ifndef _BMIINTRIN_H_INCLUDED +#define _BMIINTRIN_H_INCLUDED + +#ifndef __BMI__ +#pragma GCC push_options +#pragma GCC target("bmi") +#define __DISABLE_BMI__ +#endif /* __BMI__ */ + +extern __inline unsigned short __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__tzcnt_u16 (unsigned short __X) +{ + return __builtin_ctzs (__X); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__andn_u32 (unsigned int __X, unsigned int __Y) +{ + return ~__X & __Y; +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__bextr_u32 (unsigned int __X, unsigned int __Y) +{ + return __builtin_ia32_bextr_u32 (__X, __Y); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_bextr_u32 (unsigned int __X, unsigned int __Y, unsigned __Z) +{ + return __builtin_ia32_bextr_u32 (__X, ((__Y & 0xff) | ((__Z & 0xff) << 8))); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__blsi_u32 (unsigned int __X) +{ + return __X & -__X; +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_blsi_u32 (unsigned int __X) +{ + return __blsi_u32 (__X); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__blsmsk_u32 (unsigned int __X) +{ + return __X ^ (__X - 1); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_blsmsk_u32 (unsigned int __X) +{ + return __blsmsk_u32 (__X); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__blsr_u32 (unsigned int __X) +{ + return __X & (__X - 1); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_blsr_u32 (unsigned int __X) +{ + return __blsr_u32 (__X); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__tzcnt_u32 (unsigned int __X) +{ + return __builtin_ctz (__X); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_tzcnt_u32 (unsigned int __X) +{ + return __builtin_ctz (__X); +} + + +#ifdef __x86_64__ +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__andn_u64 (unsigned long long __X, unsigned long long __Y) +{ + return ~__X & __Y; +} + +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__bextr_u64 (unsigned long long __X, unsigned long long __Y) +{ + return __builtin_ia32_bextr_u64 (__X, __Y); +} + +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_bextr_u64 (unsigned long long __X, unsigned int __Y, unsigned int __Z) +{ + return __builtin_ia32_bextr_u64 (__X, ((__Y & 0xff) | ((__Z & 0xff) << 8))); +} + +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__blsi_u64 (unsigned long long __X) +{ + return __X & -__X; +} + +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_blsi_u64 (unsigned long long __X) +{ + return __blsi_u64 (__X); +} + +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__blsmsk_u64 (unsigned long long __X) +{ + return __X ^ (__X - 1); +} + +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_blsmsk_u64 (unsigned long long __X) +{ + return __blsmsk_u64 (__X); +} + +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__blsr_u64 (unsigned long long __X) +{ + return __X & (__X - 1); +} + +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_blsr_u64 (unsigned long long __X) +{ + return __blsr_u64 (__X); +} + +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__tzcnt_u64 (unsigned long long __X) +{ + return __builtin_ctzll (__X); +} + +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_tzcnt_u64 (unsigned long long __X) +{ + return __builtin_ctzll (__X); +} + +#endif /* __x86_64__ */ + +#ifdef __DISABLE_BMI__ +#undef __DISABLE_BMI__ +#pragma GCC pop_options +#endif /* __DISABLE_BMI__ */ + +#endif /* _BMIINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/bmmintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/bmmintrin.h new file mode 100644 index 0000000..24cf26e --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/bmmintrin.h @@ -0,0 +1,29 @@ +/* Copyright (C) 2007-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _BMMINTRIN_H_INCLUDED +#define _BMMINTRIN_H_INCLUDED + +# error "SSE5 instruction set removed from compiler" + +#endif /* _BMMINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/cpuid.h b/lib/gcc/x86_64-linux-android/4.9.x/include/cpuid.h new file mode 100644 index 0000000..8c323ae --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/cpuid.h @@ -0,0 +1,277 @@ +/* + * Copyright (C) 2007-2014 Free Software Foundation, Inc. + * + * This file is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 3, or (at your option) any + * later version. + * + * This file is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * Under Section 7 of GPL version 3, you are granted additional + * permissions described in the GCC Runtime Library Exception, version + * 3.1, as published by the Free Software Foundation. + * + * You should have received a copy of the GNU General Public License and + * a copy of the GCC Runtime Library Exception along with this program; + * see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + * . + */ + +/* %ecx */ +#define bit_SSE3 (1 << 0) +#define bit_PCLMUL (1 << 1) +#define bit_LZCNT (1 << 5) +#define bit_SSSE3 (1 << 9) +#define bit_FMA (1 << 12) +#define bit_CMPXCHG16B (1 << 13) +#define bit_SSE4_1 (1 << 19) +#define bit_SSE4_2 (1 << 20) +#define bit_MOVBE (1 << 22) +#define bit_POPCNT (1 << 23) +#define bit_AES (1 << 25) +#define bit_XSAVE (1 << 26) +#define bit_OSXSAVE (1 << 27) +#define bit_AVX (1 << 28) +#define bit_F16C (1 << 29) +#define bit_RDRND (1 << 30) + +/* %edx */ +#define bit_CMPXCHG8B (1 << 8) +#define bit_CMOV (1 << 15) +#define bit_MMX (1 << 23) +#define bit_FXSAVE (1 << 24) +#define bit_SSE (1 << 25) +#define bit_SSE2 (1 << 26) + +/* Extended Features */ +/* %ecx */ +#define bit_LAHF_LM (1 << 0) +#define bit_ABM (1 << 5) +#define bit_SSE4a (1 << 6) +#define bit_PRFCHW (1 << 8) +#define bit_XOP (1 << 11) +#define bit_LWP (1 << 15) +#define bit_FMA4 (1 << 16) +#define bit_TBM (1 << 21) + +/* %edx */ +#define bit_MMXEXT (1 << 22) +#define bit_LM (1 << 29) +#define bit_3DNOWP (1 << 30) +#define bit_3DNOW (1 << 31) + +/* Extended Features (%eax == 7) */ +/* %ebx */ +#define bit_FSGSBASE (1 << 0) +#define bit_BMI (1 << 3) +#define bit_HLE (1 << 4) +#define bit_AVX2 (1 << 5) +#define bit_BMI2 (1 << 8) +#define bit_RTM (1 << 11) +#define bit_AVX512F (1 << 16) +#define bit_RDSEED (1 << 18) +#define bit_ADX (1 << 19) +#define bit_AVX512PF (1 << 26) +#define bit_AVX512ER (1 << 27) +#define bit_AVX512CD (1 << 28) +#define bit_SHA (1 << 29) + +/* %ecx */ +#define bit_PREFETCHWT1 (1 << 0) + +/* Extended State Enumeration Sub-leaf (%eax == 13, %ecx == 1) */ +#define bit_XSAVEOPT (1 << 0) + +/* Signatures for different CPU implementations as returned in uses + of cpuid with level 0. */ +#define signature_AMD_ebx 0x68747541 +#define signature_AMD_ecx 0x444d4163 +#define signature_AMD_edx 0x69746e65 + +#define signature_CENTAUR_ebx 0x746e6543 +#define signature_CENTAUR_ecx 0x736c7561 +#define signature_CENTAUR_edx 0x48727561 + +#define signature_CYRIX_ebx 0x69727943 +#define signature_CYRIX_ecx 0x64616574 +#define signature_CYRIX_edx 0x736e4978 + +#define signature_INTEL_ebx 0x756e6547 +#define signature_INTEL_ecx 0x6c65746e +#define signature_INTEL_edx 0x49656e69 + +#define signature_TM1_ebx 0x6e617254 +#define signature_TM1_ecx 0x55504361 +#define signature_TM1_edx 0x74656d73 + +#define signature_TM2_ebx 0x756e6547 +#define signature_TM2_ecx 0x3638784d +#define signature_TM2_edx 0x54656e69 + +#define signature_NSC_ebx 0x646f6547 +#define signature_NSC_ecx 0x43534e20 +#define signature_NSC_edx 0x79622065 + +#define signature_NEXGEN_ebx 0x4778654e +#define signature_NEXGEN_ecx 0x6e657669 +#define signature_NEXGEN_edx 0x72446e65 + +#define signature_RISE_ebx 0x65736952 +#define signature_RISE_ecx 0x65736952 +#define signature_RISE_edx 0x65736952 + +#define signature_SIS_ebx 0x20536953 +#define signature_SIS_ecx 0x20536953 +#define signature_SIS_edx 0x20536953 + +#define signature_UMC_ebx 0x20434d55 +#define signature_UMC_ecx 0x20434d55 +#define signature_UMC_edx 0x20434d55 + +#define signature_VIA_ebx 0x20414956 +#define signature_VIA_ecx 0x20414956 +#define signature_VIA_edx 0x20414956 + +#define signature_VORTEX_ebx 0x74726f56 +#define signature_VORTEX_ecx 0x436f5320 +#define signature_VORTEX_edx 0x36387865 + +#if defined(__i386__) && defined(__PIC__) +/* %ebx may be the PIC register. */ +#if __GNUC__ >= 3 +#define __cpuid(level, a, b, c, d) \ + __asm__ ("xchg{l}\t{%%}ebx, %k1\n\t" \ + "cpuid\n\t" \ + "xchg{l}\t{%%}ebx, %k1\n\t" \ + : "=a" (a), "=&r" (b), "=c" (c), "=d" (d) \ + : "0" (level)) + +#define __cpuid_count(level, count, a, b, c, d) \ + __asm__ ("xchg{l}\t{%%}ebx, %k1\n\t" \ + "cpuid\n\t" \ + "xchg{l}\t{%%}ebx, %k1\n\t" \ + : "=a" (a), "=&r" (b), "=c" (c), "=d" (d) \ + : "0" (level), "2" (count)) +#else +/* Host GCCs older than 3.0 weren't supporting Intel asm syntax + nor alternatives in i386 code. */ +#define __cpuid(level, a, b, c, d) \ + __asm__ ("xchgl\t%%ebx, %k1\n\t" \ + "cpuid\n\t" \ + "xchgl\t%%ebx, %k1\n\t" \ + : "=a" (a), "=&r" (b), "=c" (c), "=d" (d) \ + : "0" (level)) + +#define __cpuid_count(level, count, a, b, c, d) \ + __asm__ ("xchgl\t%%ebx, %k1\n\t" \ + "cpuid\n\t" \ + "xchgl\t%%ebx, %k1\n\t" \ + : "=a" (a), "=&r" (b), "=c" (c), "=d" (d) \ + : "0" (level), "2" (count)) +#endif +#elif defined(__x86_64__) && (defined(__code_model_medium__) || defined(__code_model_large__)) && defined(__PIC__) +/* %rbx may be the PIC register. */ +#define __cpuid(level, a, b, c, d) \ + __asm__ ("xchg{q}\t{%%}rbx, %q1\n\t" \ + "cpuid\n\t" \ + "xchg{q}\t{%%}rbx, %q1\n\t" \ + : "=a" (a), "=&r" (b), "=c" (c), "=d" (d) \ + : "0" (level)) + +#define __cpuid_count(level, count, a, b, c, d) \ + __asm__ ("xchg{q}\t{%%}rbx, %q1\n\t" \ + "cpuid\n\t" \ + "xchg{q}\t{%%}rbx, %q1\n\t" \ + : "=a" (a), "=&r" (b), "=c" (c), "=d" (d) \ + : "0" (level), "2" (count)) +#else +#define __cpuid(level, a, b, c, d) \ + __asm__ ("cpuid\n\t" \ + : "=a" (a), "=b" (b), "=c" (c), "=d" (d) \ + : "0" (level)) + +#define __cpuid_count(level, count, a, b, c, d) \ + __asm__ ("cpuid\n\t" \ + : "=a" (a), "=b" (b), "=c" (c), "=d" (d) \ + : "0" (level), "2" (count)) +#endif + +/* Return highest supported input value for cpuid instruction. ext can + be either 0x0 or 0x8000000 to return highest supported value for + basic or extended cpuid information. Function returns 0 if cpuid + is not supported or whatever cpuid returns in eax register. If sig + pointer is non-null, then first four bytes of the signature + (as found in ebx register) are returned in location pointed by sig. */ + +static __inline unsigned int +__get_cpuid_max (unsigned int __ext, unsigned int *__sig) +{ + unsigned int __eax, __ebx, __ecx, __edx; + +#ifndef __x86_64__ + /* See if we can use cpuid. On AMD64 we always can. */ +#if __GNUC__ >= 3 + __asm__ ("pushf{l|d}\n\t" + "pushf{l|d}\n\t" + "pop{l}\t%0\n\t" + "mov{l}\t{%0, %1|%1, %0}\n\t" + "xor{l}\t{%2, %0|%0, %2}\n\t" + "push{l}\t%0\n\t" + "popf{l|d}\n\t" + "pushf{l|d}\n\t" + "pop{l}\t%0\n\t" + "popf{l|d}\n\t" + : "=&r" (__eax), "=&r" (__ebx) + : "i" (0x00200000)); +#else +/* Host GCCs older than 3.0 weren't supporting Intel asm syntax + nor alternatives in i386 code. */ + __asm__ ("pushfl\n\t" + "pushfl\n\t" + "popl\t%0\n\t" + "movl\t%0, %1\n\t" + "xorl\t%2, %0\n\t" + "pushl\t%0\n\t" + "popfl\n\t" + "pushfl\n\t" + "popl\t%0\n\t" + "popfl\n\t" + : "=&r" (__eax), "=&r" (__ebx) + : "i" (0x00200000)); +#endif + + if (!((__eax ^ __ebx) & 0x00200000)) + return 0; +#endif + + /* Host supports cpuid. Return highest supported cpuid input value. */ + __cpuid (__ext, __eax, __ebx, __ecx, __edx); + + if (__sig) + *__sig = __ebx; + + return __eax; +} + +/* Return cpuid data for requested cpuid level, as found in returned + eax, ebx, ecx and edx registers. The function checks if cpuid is + supported and returns 1 for valid cpuid information or 0 for + unsupported cpuid level. All pointers are required to be non-null. */ + +static __inline int +__get_cpuid (unsigned int __level, + unsigned int *__eax, unsigned int *__ebx, + unsigned int *__ecx, unsigned int *__edx) +{ + unsigned int __ext = __level & 0x80000000; + + if (__get_cpuid_max (__ext, 0) < __level) + return 0; + + __cpuid (__level, *__eax, *__ebx, *__ecx, *__edx); + return 1; +} diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/cross-stdarg.h b/lib/gcc/x86_64-linux-android/4.9.x/include/cross-stdarg.h new file mode 100644 index 0000000..d16cef8 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/cross-stdarg.h @@ -0,0 +1,72 @@ +/* Copyright (C) 2002-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef __CROSS_STDARG_H_INCLUDED +#define __CROSS_STDARG_H_INCLUDED + +/* Make sure that for non x64 targets cross builtins are defined. */ +#ifndef __x86_64__ +/* Call abi ms_abi. */ +#define __builtin_ms_va_list __builtin_va_list +#define __builtin_ms_va_copy __builtin_va_copy +#define __builtin_ms_va_start __builtin_va_start +#define __builtin_ms_va_end __builtin_va_end + +/* Call abi sysv_abi. */ +#define __builtin_sysv_va_list __builtin_va_list +#define __builtin_sysv_va_copy __builtin_va_copy +#define __builtin_sysv_va_start __builtin_va_start +#define __builtin_sysv_va_end __builtin_va_end +#endif + +#define __ms_va_copy(__d,__s) __builtin_ms_va_copy(__d,__s) +#define __ms_va_start(__v,__l) __builtin_ms_va_start(__v,__l) +#define __ms_va_arg(__v,__l) __builtin_va_arg(__v,__l) +#define __ms_va_end(__v) __builtin_ms_va_end(__v) + +#define __sysv_va_copy(__d,__s) __builtin_sysv_va_copy(__d,__s) +#define __sysv_va_start(__v,__l) __builtin_sysv_va_start(__v,__l) +#define __sysv_va_arg(__v,__l) __builtin_va_arg(__v,__l) +#define __sysv_va_end(__v) __builtin_sysv_va_end(__v) + +#ifndef __GNUC_SYSV_VA_LIST +#define __GNUC_SYSV_VA_LIST + typedef __builtin_sysv_va_list __gnuc_sysv_va_list; +#endif + +#ifndef _SYSV_VA_LIST_DEFINED +#define _SYSV_VA_LIST_DEFINED + typedef __gnuc_sysv_va_list sysv_va_list; +#endif + +#ifndef __GNUC_MS_VA_LIST +#define __GNUC_MS_VA_LIST + typedef __builtin_ms_va_list __gnuc_ms_va_list; +#endif + +#ifndef _MS_VA_LIST_DEFINED +#define _MS_VA_LIST_DEFINED + typedef __gnuc_ms_va_list ms_va_list; +#endif + +#endif /* __CROSS_STDARG_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/emmintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/emmintrin.h new file mode 100644 index 0000000..a2bdf0e --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/emmintrin.h @@ -0,0 +1,1541 @@ +/* Copyright (C) 2003-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* Implemented from the specification included in the Intel C++ Compiler + User Guide and Reference, version 9.0. */ + +#ifndef _EMMINTRIN_H_INCLUDED +#define _EMMINTRIN_H_INCLUDED + +/* We need definitions from the SSE header files*/ +#include + +#ifndef __SSE2__ +#pragma GCC push_options +#pragma GCC target("sse2") +#define __DISABLE_SSE2__ +#endif /* __SSE2__ */ + +/* SSE2 */ +typedef double __v2df __attribute__ ((__vector_size__ (16))); +typedef long long __v2di __attribute__ ((__vector_size__ (16))); +typedef int __v4si __attribute__ ((__vector_size__ (16))); +typedef short __v8hi __attribute__ ((__vector_size__ (16))); +typedef char __v16qi __attribute__ ((__vector_size__ (16))); + +/* The Intel API is flexible enough that we must allow aliasing with other + vector types, and their scalar components. */ +typedef long long __m128i __attribute__ ((__vector_size__ (16), __may_alias__)); +typedef double __m128d __attribute__ ((__vector_size__ (16), __may_alias__)); + +/* Create a selector for use with the SHUFPD instruction. */ +#define _MM_SHUFFLE2(fp1,fp0) \ + (((fp1) << 1) | (fp0)) + +/* Create a vector with element 0 as F and the rest zero. */ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_sd (double __F) +{ + return __extension__ (__m128d){ __F, 0.0 }; +} + +/* Create a vector with both elements equal to F. */ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set1_pd (double __F) +{ + return __extension__ (__m128d){ __F, __F }; +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_pd1 (double __F) +{ + return _mm_set1_pd (__F); +} + +/* Create a vector with the lower value X and upper value W. */ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_pd (double __W, double __X) +{ + return __extension__ (__m128d){ __X, __W }; +} + +/* Create a vector with the lower value W and upper value X. */ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_setr_pd (double __W, double __X) +{ + return __extension__ (__m128d){ __W, __X }; +} + +/* Create an undefined vector. */ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_undefined_pd (void) +{ + __m128d __Y = __Y; + return __Y; +} + +/* Create a vector of zeros. */ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_setzero_pd (void) +{ + return __extension__ (__m128d){ 0.0, 0.0 }; +} + +/* Sets the low DPFP value of A from the low value of B. */ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_move_sd (__m128d __A, __m128d __B) +{ + return (__m128d) __builtin_ia32_movsd ((__v2df)__A, (__v2df)__B); +} + +/* Load two DPFP values from P. The address must be 16-byte aligned. */ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_load_pd (double const *__P) +{ + return *(__m128d *)__P; +} + +/* Load two DPFP values from P. The address need not be 16-byte aligned. */ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_loadu_pd (double const *__P) +{ + return __builtin_ia32_loadupd (__P); +} + +/* Create a vector with all two elements equal to *P. */ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_load1_pd (double const *__P) +{ + return _mm_set1_pd (*__P); +} + +/* Create a vector with element 0 as *P and the rest zero. */ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_load_sd (double const *__P) +{ + return _mm_set_sd (*__P); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_load_pd1 (double const *__P) +{ + return _mm_load1_pd (__P); +} + +/* Load two DPFP values in reverse order. The address must be aligned. */ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_loadr_pd (double const *__P) +{ + __m128d __tmp = _mm_load_pd (__P); + return __builtin_ia32_shufpd (__tmp, __tmp, _MM_SHUFFLE2 (0,1)); +} + +/* Store two DPFP values. The address must be 16-byte aligned. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_store_pd (double *__P, __m128d __A) +{ + *(__m128d *)__P = __A; +} + +/* Store two DPFP values. The address need not be 16-byte aligned. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_storeu_pd (double *__P, __m128d __A) +{ + __builtin_ia32_storeupd (__P, __A); +} + +/* Stores the lower DPFP value. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_store_sd (double *__P, __m128d __A) +{ + *__P = __builtin_ia32_vec_ext_v2df (__A, 0); +} + +extern __inline double __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsd_f64 (__m128d __A) +{ + return __builtin_ia32_vec_ext_v2df (__A, 0); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_storel_pd (double *__P, __m128d __A) +{ + _mm_store_sd (__P, __A); +} + +/* Stores the upper DPFP value. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_storeh_pd (double *__P, __m128d __A) +{ + *__P = __builtin_ia32_vec_ext_v2df (__A, 1); +} + +/* Store the lower DPFP value across two words. + The address must be 16-byte aligned. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_store1_pd (double *__P, __m128d __A) +{ + _mm_store_pd (__P, __builtin_ia32_shufpd (__A, __A, _MM_SHUFFLE2 (0,0))); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_store_pd1 (double *__P, __m128d __A) +{ + _mm_store1_pd (__P, __A); +} + +/* Store two DPFP values in reverse order. The address must be aligned. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_storer_pd (double *__P, __m128d __A) +{ + _mm_store_pd (__P, __builtin_ia32_shufpd (__A, __A, _MM_SHUFFLE2 (0,1))); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi128_si32 (__m128i __A) +{ + return __builtin_ia32_vec_ext_v4si ((__v4si)__A, 0); +} + +#ifdef __x86_64__ +/* Intel intrinsic. */ +extern __inline long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi128_si64 (__m128i __A) +{ + return __builtin_ia32_vec_ext_v2di ((__v2di)__A, 0); +} + +/* Microsoft intrinsic. */ +extern __inline long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi128_si64x (__m128i __A) +{ + return __builtin_ia32_vec_ext_v2di ((__v2di)__A, 0); +} +#endif + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_add_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_addpd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_add_sd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_addsd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sub_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_subpd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sub_sd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_subsd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mul_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_mulpd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mul_sd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_mulsd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_div_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_divpd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_div_sd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_divsd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sqrt_pd (__m128d __A) +{ + return (__m128d)__builtin_ia32_sqrtpd ((__v2df)__A); +} + +/* Return pair {sqrt (A[0), B[1]}. */ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sqrt_sd (__m128d __A, __m128d __B) +{ + __v2df __tmp = __builtin_ia32_movsd ((__v2df)__A, (__v2df)__B); + return (__m128d)__builtin_ia32_sqrtsd ((__v2df)__tmp); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_minpd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_sd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_minsd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_maxpd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_sd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_maxsd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_and_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_andpd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_andnot_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_andnpd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_or_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_orpd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_xor_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_xorpd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpeq_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_cmpeqpd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmplt_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_cmpltpd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmple_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_cmplepd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpgt_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_cmpgtpd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpge_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_cmpgepd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpneq_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_cmpneqpd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpnlt_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_cmpnltpd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpnle_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_cmpnlepd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpngt_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_cmpngtpd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpnge_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_cmpngepd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpord_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_cmpordpd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpunord_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_cmpunordpd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpeq_sd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_cmpeqsd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmplt_sd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_cmpltsd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmple_sd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_cmplesd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpgt_sd (__m128d __A, __m128d __B) +{ + return (__m128d) __builtin_ia32_movsd ((__v2df) __A, + (__v2df) + __builtin_ia32_cmpltsd ((__v2df) __B, + (__v2df) + __A)); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpge_sd (__m128d __A, __m128d __B) +{ + return (__m128d) __builtin_ia32_movsd ((__v2df) __A, + (__v2df) + __builtin_ia32_cmplesd ((__v2df) __B, + (__v2df) + __A)); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpneq_sd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_cmpneqsd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpnlt_sd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_cmpnltsd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpnle_sd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_cmpnlesd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpngt_sd (__m128d __A, __m128d __B) +{ + return (__m128d) __builtin_ia32_movsd ((__v2df) __A, + (__v2df) + __builtin_ia32_cmpnltsd ((__v2df) __B, + (__v2df) + __A)); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpnge_sd (__m128d __A, __m128d __B) +{ + return (__m128d) __builtin_ia32_movsd ((__v2df) __A, + (__v2df) + __builtin_ia32_cmpnlesd ((__v2df) __B, + (__v2df) + __A)); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpord_sd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_cmpordsd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpunord_sd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_cmpunordsd ((__v2df)__A, (__v2df)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comieq_sd (__m128d __A, __m128d __B) +{ + return __builtin_ia32_comisdeq ((__v2df)__A, (__v2df)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comilt_sd (__m128d __A, __m128d __B) +{ + return __builtin_ia32_comisdlt ((__v2df)__A, (__v2df)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comile_sd (__m128d __A, __m128d __B) +{ + return __builtin_ia32_comisdle ((__v2df)__A, (__v2df)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comigt_sd (__m128d __A, __m128d __B) +{ + return __builtin_ia32_comisdgt ((__v2df)__A, (__v2df)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comige_sd (__m128d __A, __m128d __B) +{ + return __builtin_ia32_comisdge ((__v2df)__A, (__v2df)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comineq_sd (__m128d __A, __m128d __B) +{ + return __builtin_ia32_comisdneq ((__v2df)__A, (__v2df)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ucomieq_sd (__m128d __A, __m128d __B) +{ + return __builtin_ia32_ucomisdeq ((__v2df)__A, (__v2df)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ucomilt_sd (__m128d __A, __m128d __B) +{ + return __builtin_ia32_ucomisdlt ((__v2df)__A, (__v2df)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ucomile_sd (__m128d __A, __m128d __B) +{ + return __builtin_ia32_ucomisdle ((__v2df)__A, (__v2df)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ucomigt_sd (__m128d __A, __m128d __B) +{ + return __builtin_ia32_ucomisdgt ((__v2df)__A, (__v2df)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ucomige_sd (__m128d __A, __m128d __B) +{ + return __builtin_ia32_ucomisdge ((__v2df)__A, (__v2df)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ucomineq_sd (__m128d __A, __m128d __B) +{ + return __builtin_ia32_ucomisdneq ((__v2df)__A, (__v2df)__B); +} + +/* Create a vector of Qi, where i is the element number. */ + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_epi64x (long long __q1, long long __q0) +{ + return __extension__ (__m128i)(__v2di){ __q0, __q1 }; +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_epi64 (__m64 __q1, __m64 __q0) +{ + return _mm_set_epi64x ((long long)__q1, (long long)__q0); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_epi32 (int __q3, int __q2, int __q1, int __q0) +{ + return __extension__ (__m128i)(__v4si){ __q0, __q1, __q2, __q3 }; +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_epi16 (short __q7, short __q6, short __q5, short __q4, + short __q3, short __q2, short __q1, short __q0) +{ + return __extension__ (__m128i)(__v8hi){ + __q0, __q1, __q2, __q3, __q4, __q5, __q6, __q7 }; +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_epi8 (char __q15, char __q14, char __q13, char __q12, + char __q11, char __q10, char __q09, char __q08, + char __q07, char __q06, char __q05, char __q04, + char __q03, char __q02, char __q01, char __q00) +{ + return __extension__ (__m128i)(__v16qi){ + __q00, __q01, __q02, __q03, __q04, __q05, __q06, __q07, + __q08, __q09, __q10, __q11, __q12, __q13, __q14, __q15 + }; +} + +/* Set all of the elements of the vector to A. */ + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set1_epi64x (long long __A) +{ + return _mm_set_epi64x (__A, __A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set1_epi64 (__m64 __A) +{ + return _mm_set_epi64 (__A, __A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set1_epi32 (int __A) +{ + return _mm_set_epi32 (__A, __A, __A, __A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set1_epi16 (short __A) +{ + return _mm_set_epi16 (__A, __A, __A, __A, __A, __A, __A, __A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set1_epi8 (char __A) +{ + return _mm_set_epi8 (__A, __A, __A, __A, __A, __A, __A, __A, + __A, __A, __A, __A, __A, __A, __A, __A); +} + +/* Create a vector of Qi, where i is the element number. + The parameter order is reversed from the _mm_set_epi* functions. */ + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_setr_epi64 (__m64 __q0, __m64 __q1) +{ + return _mm_set_epi64 (__q1, __q0); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_setr_epi32 (int __q0, int __q1, int __q2, int __q3) +{ + return _mm_set_epi32 (__q3, __q2, __q1, __q0); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_setr_epi16 (short __q0, short __q1, short __q2, short __q3, + short __q4, short __q5, short __q6, short __q7) +{ + return _mm_set_epi16 (__q7, __q6, __q5, __q4, __q3, __q2, __q1, __q0); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_setr_epi8 (char __q00, char __q01, char __q02, char __q03, + char __q04, char __q05, char __q06, char __q07, + char __q08, char __q09, char __q10, char __q11, + char __q12, char __q13, char __q14, char __q15) +{ + return _mm_set_epi8 (__q15, __q14, __q13, __q12, __q11, __q10, __q09, __q08, + __q07, __q06, __q05, __q04, __q03, __q02, __q01, __q00); +} + +/* Create a vector with element 0 as *P and the rest zero. */ + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_load_si128 (__m128i const *__P) +{ + return *__P; +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_loadu_si128 (__m128i const *__P) +{ + return (__m128i) __builtin_ia32_loaddqu ((char const *)__P); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_loadl_epi64 (__m128i const *__P) +{ + return _mm_set_epi64 ((__m64)0LL, *(__m64 *)__P); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_store_si128 (__m128i *__P, __m128i __B) +{ + *__P = __B; +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_storeu_si128 (__m128i *__P, __m128i __B) +{ + __builtin_ia32_storedqu ((char *)__P, (__v16qi)__B); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_storel_epi64 (__m128i *__P, __m128i __B) +{ + *(long long *)__P = __builtin_ia32_vec_ext_v2di ((__v2di)__B, 0); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_movepi64_pi64 (__m128i __B) +{ + return (__m64) __builtin_ia32_vec_ext_v2di ((__v2di)__B, 0); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_movpi64_epi64 (__m64 __A) +{ + return _mm_set_epi64 ((__m64)0LL, __A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_move_epi64 (__m128i __A) +{ + return (__m128i)__builtin_ia32_movq128 ((__v2di) __A); +} + +/* Create an undefined vector. */ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_undefined_si128 (void) +{ + __m128i __Y = __Y; + return __Y; +} + +/* Create a vector of zeros. */ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_setzero_si128 (void) +{ + return __extension__ (__m128i)(__v4si){ 0, 0, 0, 0 }; +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepi32_pd (__m128i __A) +{ + return (__m128d)__builtin_ia32_cvtdq2pd ((__v4si) __A); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepi32_ps (__m128i __A) +{ + return (__m128)__builtin_ia32_cvtdq2ps ((__v4si) __A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtpd_epi32 (__m128d __A) +{ + return (__m128i)__builtin_ia32_cvtpd2dq ((__v2df) __A); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtpd_pi32 (__m128d __A) +{ + return (__m64)__builtin_ia32_cvtpd2pi ((__v2df) __A); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtpd_ps (__m128d __A) +{ + return (__m128)__builtin_ia32_cvtpd2ps ((__v2df) __A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttpd_epi32 (__m128d __A) +{ + return (__m128i)__builtin_ia32_cvttpd2dq ((__v2df) __A); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttpd_pi32 (__m128d __A) +{ + return (__m64)__builtin_ia32_cvttpd2pi ((__v2df) __A); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtpi32_pd (__m64 __A) +{ + return (__m128d)__builtin_ia32_cvtpi2pd ((__v2si) __A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtps_epi32 (__m128 __A) +{ + return (__m128i)__builtin_ia32_cvtps2dq ((__v4sf) __A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttps_epi32 (__m128 __A) +{ + return (__m128i)__builtin_ia32_cvttps2dq ((__v4sf) __A); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtps_pd (__m128 __A) +{ + return (__m128d)__builtin_ia32_cvtps2pd ((__v4sf) __A); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsd_si32 (__m128d __A) +{ + return __builtin_ia32_cvtsd2si ((__v2df) __A); +} + +#ifdef __x86_64__ +/* Intel intrinsic. */ +extern __inline long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsd_si64 (__m128d __A) +{ + return __builtin_ia32_cvtsd2si64 ((__v2df) __A); +} + +/* Microsoft intrinsic. */ +extern __inline long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsd_si64x (__m128d __A) +{ + return __builtin_ia32_cvtsd2si64 ((__v2df) __A); +} +#endif + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttsd_si32 (__m128d __A) +{ + return __builtin_ia32_cvttsd2si ((__v2df) __A); +} + +#ifdef __x86_64__ +/* Intel intrinsic. */ +extern __inline long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttsd_si64 (__m128d __A) +{ + return __builtin_ia32_cvttsd2si64 ((__v2df) __A); +} + +/* Microsoft intrinsic. */ +extern __inline long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttsd_si64x (__m128d __A) +{ + return __builtin_ia32_cvttsd2si64 ((__v2df) __A); +} +#endif + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsd_ss (__m128 __A, __m128d __B) +{ + return (__m128)__builtin_ia32_cvtsd2ss ((__v4sf) __A, (__v2df) __B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi32_sd (__m128d __A, int __B) +{ + return (__m128d)__builtin_ia32_cvtsi2sd ((__v2df) __A, __B); +} + +#ifdef __x86_64__ +/* Intel intrinsic. */ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi64_sd (__m128d __A, long long __B) +{ + return (__m128d)__builtin_ia32_cvtsi642sd ((__v2df) __A, __B); +} + +/* Microsoft intrinsic. */ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi64x_sd (__m128d __A, long long __B) +{ + return (__m128d)__builtin_ia32_cvtsi642sd ((__v2df) __A, __B); +} +#endif + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtss_sd (__m128d __A, __m128 __B) +{ + return (__m128d)__builtin_ia32_cvtss2sd ((__v2df) __A, (__v4sf)__B); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_shuffle_pd(__m128d __A, __m128d __B, const int __mask) +{ + return (__m128d)__builtin_ia32_shufpd ((__v2df)__A, (__v2df)__B, __mask); +} +#else +#define _mm_shuffle_pd(A, B, N) \ + ((__m128d)__builtin_ia32_shufpd ((__v2df)(__m128d)(A), \ + (__v2df)(__m128d)(B), (int)(N))) +#endif + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_unpackhi_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_unpckhpd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_unpacklo_pd (__m128d __A, __m128d __B) +{ + return (__m128d)__builtin_ia32_unpcklpd ((__v2df)__A, (__v2df)__B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_loadh_pd (__m128d __A, double const *__B) +{ + return (__m128d)__builtin_ia32_loadhpd ((__v2df)__A, __B); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_loadl_pd (__m128d __A, double const *__B) +{ + return (__m128d)__builtin_ia32_loadlpd ((__v2df)__A, __B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_movemask_pd (__m128d __A) +{ + return __builtin_ia32_movmskpd ((__v2df)__A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_packs_epi16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_packsswb128 ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_packs_epi32 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_packssdw128 ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_packus_epi16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_packuswb128 ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_unpackhi_epi8 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_punpckhbw128 ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_unpackhi_epi16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_punpckhwd128 ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_unpackhi_epi32 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_punpckhdq128 ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_unpackhi_epi64 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_punpckhqdq128 ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_unpacklo_epi8 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_punpcklbw128 ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_unpacklo_epi16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_punpcklwd128 ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_unpacklo_epi32 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_punpckldq128 ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_unpacklo_epi64 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_punpcklqdq128 ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_add_epi8 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_paddb128 ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_add_epi16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_paddw128 ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_add_epi32 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_paddd128 ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_add_epi64 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_paddq128 ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_adds_epi8 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_paddsb128 ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_adds_epi16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_paddsw128 ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_adds_epu8 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_paddusb128 ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_adds_epu16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_paddusw128 ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sub_epi8 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_psubb128 ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sub_epi16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_psubw128 ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sub_epi32 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_psubd128 ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sub_epi64 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_psubq128 ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_subs_epi8 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_psubsb128 ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_subs_epi16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_psubsw128 ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_subs_epu8 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_psubusb128 ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_subs_epu16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_psubusw128 ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_madd_epi16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pmaddwd128 ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mulhi_epi16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pmulhw128 ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mullo_epi16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pmullw128 ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mul_su32 (__m64 __A, __m64 __B) +{ + return (__m64)__builtin_ia32_pmuludq ((__v2si)__A, (__v2si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mul_epu32 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pmuludq128 ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_slli_epi16 (__m128i __A, int __B) +{ + return (__m128i)__builtin_ia32_psllwi128 ((__v8hi)__A, __B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_slli_epi32 (__m128i __A, int __B) +{ + return (__m128i)__builtin_ia32_pslldi128 ((__v4si)__A, __B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_slli_epi64 (__m128i __A, int __B) +{ + return (__m128i)__builtin_ia32_psllqi128 ((__v2di)__A, __B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srai_epi16 (__m128i __A, int __B) +{ + return (__m128i)__builtin_ia32_psrawi128 ((__v8hi)__A, __B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srai_epi32 (__m128i __A, int __B) +{ + return (__m128i)__builtin_ia32_psradi128 ((__v4si)__A, __B); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srli_si128 (__m128i __A, const int __N) +{ + return (__m128i)__builtin_ia32_psrldqi128 (__A, __N * 8); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_slli_si128 (__m128i __A, const int __N) +{ + return (__m128i)__builtin_ia32_pslldqi128 (__A, __N * 8); +} +#else +#define _mm_srli_si128(A, N) \ + ((__m128i)__builtin_ia32_psrldqi128 ((__m128i)(A), (int)(N) * 8)) +#define _mm_slli_si128(A, N) \ + ((__m128i)__builtin_ia32_pslldqi128 ((__m128i)(A), (int)(N) * 8)) +#endif + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srli_epi16 (__m128i __A, int __B) +{ + return (__m128i)__builtin_ia32_psrlwi128 ((__v8hi)__A, __B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srli_epi32 (__m128i __A, int __B) +{ + return (__m128i)__builtin_ia32_psrldi128 ((__v4si)__A, __B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srli_epi64 (__m128i __A, int __B) +{ + return (__m128i)__builtin_ia32_psrlqi128 ((__v2di)__A, __B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sll_epi16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_psllw128((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sll_epi32 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pslld128((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sll_epi64 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_psllq128((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sra_epi16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_psraw128 ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sra_epi32 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_psrad128 ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srl_epi16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_psrlw128 ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srl_epi32 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_psrld128 ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srl_epi64 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_psrlq128 ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_and_si128 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pand128 ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_andnot_si128 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pandn128 ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_or_si128 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_por128 ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_xor_si128 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pxor128 ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpeq_epi8 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pcmpeqb128 ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpeq_epi16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pcmpeqw128 ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpeq_epi32 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pcmpeqd128 ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmplt_epi8 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pcmpgtb128 ((__v16qi)__B, (__v16qi)__A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmplt_epi16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pcmpgtw128 ((__v8hi)__B, (__v8hi)__A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmplt_epi32 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pcmpgtd128 ((__v4si)__B, (__v4si)__A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpgt_epi8 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pcmpgtb128 ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpgt_epi16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pcmpgtw128 ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpgt_epi32 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pcmpgtd128 ((__v4si)__A, (__v4si)__B); +} + +#ifdef __OPTIMIZE__ +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_extract_epi16 (__m128i const __A, int const __N) +{ + return (unsigned short) __builtin_ia32_vec_ext_v8hi ((__v8hi)__A, __N); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_insert_epi16 (__m128i const __A, int const __D, int const __N) +{ + return (__m128i) __builtin_ia32_vec_set_v8hi ((__v8hi)__A, __D, __N); +} +#else +#define _mm_extract_epi16(A, N) \ + ((int) (unsigned short) __builtin_ia32_vec_ext_v8hi ((__v8hi)(__m128i)(A), (int)(N))) +#define _mm_insert_epi16(A, D, N) \ + ((__m128i) __builtin_ia32_vec_set_v8hi ((__v8hi)(__m128i)(A), \ + (int)(D), (int)(N))) +#endif + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_epi16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pmaxsw128 ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_epu8 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pmaxub128 ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_epi16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pminsw128 ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_epu8 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pminub128 ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_movemask_epi8 (__m128i __A) +{ + return __builtin_ia32_pmovmskb128 ((__v16qi)__A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mulhi_epu16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pmulhuw128 ((__v8hi)__A, (__v8hi)__B); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_shufflehi_epi16 (__m128i __A, const int __mask) +{ + return (__m128i)__builtin_ia32_pshufhw ((__v8hi)__A, __mask); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_shufflelo_epi16 (__m128i __A, const int __mask) +{ + return (__m128i)__builtin_ia32_pshuflw ((__v8hi)__A, __mask); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_shuffle_epi32 (__m128i __A, const int __mask) +{ + return (__m128i)__builtin_ia32_pshufd ((__v4si)__A, __mask); +} +#else +#define _mm_shufflehi_epi16(A, N) \ + ((__m128i)__builtin_ia32_pshufhw ((__v8hi)(__m128i)(A), (int)(N))) +#define _mm_shufflelo_epi16(A, N) \ + ((__m128i)__builtin_ia32_pshuflw ((__v8hi)(__m128i)(A), (int)(N))) +#define _mm_shuffle_epi32(A, N) \ + ((__m128i)__builtin_ia32_pshufd ((__v4si)(__m128i)(A), (int)(N))) +#endif + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskmoveu_si128 (__m128i __A, __m128i __B, char *__C) +{ + __builtin_ia32_maskmovdqu ((__v16qi)__A, (__v16qi)__B, __C); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_avg_epu8 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pavgb128 ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_avg_epu16 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_pavgw128 ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sad_epu8 (__m128i __A, __m128i __B) +{ + return (__m128i)__builtin_ia32_psadbw128 ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_stream_si32 (int *__A, int __B) +{ + __builtin_ia32_movnti (__A, __B); +} + +#ifdef __x86_64__ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_stream_si64 (long long int *__A, long long int __B) +{ + __builtin_ia32_movnti64 (__A, __B); +} +#endif + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_stream_si128 (__m128i *__A, __m128i __B) +{ + __builtin_ia32_movntdq ((__v2di *)__A, (__v2di)__B); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_stream_pd (double *__A, __m128d __B) +{ + __builtin_ia32_movntpd (__A, (__v2df)__B); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_clflush (void const *__A) +{ + __builtin_ia32_clflush (__A); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_lfence (void) +{ + __builtin_ia32_lfence (); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mfence (void) +{ + __builtin_ia32_mfence (); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi32_si128 (int __A) +{ + return _mm_set_epi32 (0, 0, 0, __A); +} + +#ifdef __x86_64__ +/* Intel intrinsic. */ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi64_si128 (long long __A) +{ + return _mm_set_epi64x (0, __A); +} + +/* Microsoft intrinsic. */ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi64x_si128 (long long __A) +{ + return _mm_set_epi64x (0, __A); +} +#endif + +/* Casts between various SP, DP, INT vector types. Note that these do no + conversion of values, they just change the type. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castpd_ps(__m128d __A) +{ + return (__m128) __A; +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castpd_si128(__m128d __A) +{ + return (__m128i) __A; +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castps_pd(__m128 __A) +{ + return (__m128d) __A; +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castps_si128(__m128 __A) +{ + return (__m128i) __A; +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castsi128_ps(__m128i __A) +{ + return (__m128) __A; +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castsi128_pd(__m128i __A) +{ + return (__m128d) __A; +} + +#ifdef __DISABLE_SSE2__ +#undef __DISABLE_SSE2__ +#pragma GCC pop_options +#endif /* __DISABLE_SSE2__ */ + +#endif /* _EMMINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/f16cintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/f16cintrin.h new file mode 100644 index 0000000..1181f8b --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/f16cintrin.h @@ -0,0 +1,98 @@ +/* Copyright (C) 2011-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#if !defined _X86INTRIN_H_INCLUDED && !defined _IMMINTRIN_H_INCLUDED +# error "Never use directly; include or instead." +#endif + +#ifndef _F16CINTRIN_H_INCLUDED +#define _F16CINTRIN_H_INCLUDED + +#ifndef __F16C__ +#pragma GCC push_options +#pragma GCC target("f16c") +#define __DISABLE_F16C__ +#endif /* __F16C__ */ + +extern __inline float __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_cvtsh_ss (unsigned short __S) +{ + __v8hi __H = __extension__ (__v8hi){ (short) __S, 0, 0, 0, 0, 0, 0, 0 }; + __v4sf __A = __builtin_ia32_vcvtph2ps (__H); + return __builtin_ia32_vec_ext_v4sf (__A, 0); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtph_ps (__m128i __A) +{ + return (__m128) __builtin_ia32_vcvtph2ps ((__v8hi) __A); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtph_ps (__m128i __A) +{ + return (__m256) __builtin_ia32_vcvtph2ps256 ((__v8hi) __A); +} + +#ifdef __OPTIMIZE__ +extern __inline unsigned short __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_cvtss_sh (float __F, const int __I) +{ + __v4sf __A = __extension__ (__v4sf){ __F, 0, 0, 0 }; + __v8hi __H = __builtin_ia32_vcvtps2ph (__A, __I); + return (unsigned short) __builtin_ia32_vec_ext_v8hi (__H, 0); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtps_ph (__m128 __A, const int __I) +{ + return (__m128i) __builtin_ia32_vcvtps2ph ((__v4sf) __A, __I); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtps_ph (__m256 __A, const int __I) +{ + return (__m128i) __builtin_ia32_vcvtps2ph256 ((__v8sf) __A, __I); +} +#else +#define _cvtss_sh(__F, __I) \ + (__extension__ \ + ({ \ + __v4sf __A = __extension__ (__v4sf){ __F, 0, 0, 0 }; \ + __v8hi __H = __builtin_ia32_vcvtps2ph (__A, __I); \ + (unsigned short) __builtin_ia32_vec_ext_v8hi (__H, 0); \ + })) + +#define _mm_cvtps_ph(A, I) \ + ((__m128i) __builtin_ia32_vcvtps2ph ((__v4sf)(__m128) A, (int) (I))) + +#define _mm256_cvtps_ph(A, I) \ + ((__m128i) __builtin_ia32_vcvtps2ph256 ((__v8sf)(__m256) A, (int) (I))) +#endif /* __OPTIMIZE */ + +#ifdef __DISABLE_F16C__ +#undef __DISABLE_F16C__ +#pragma GCC pop_options +#endif /* __DISABLE_F16C__ */ + +#endif /* _F16CINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/float.h b/lib/gcc/x86_64-linux-android/4.9.x/include/float.h new file mode 100644 index 0000000..a8e05bf --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/float.h @@ -0,0 +1,277 @@ +/* Copyright (C) 2002-2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +. */ + +/* + * ISO C Standard: 5.2.4.2.2 Characteristics of floating types + */ + +#ifndef _FLOAT_H___ +#define _FLOAT_H___ + +/* Radix of exponent representation, b. */ +#undef FLT_RADIX +#define FLT_RADIX __FLT_RADIX__ + +/* Number of base-FLT_RADIX digits in the significand, p. */ +#undef FLT_MANT_DIG +#undef DBL_MANT_DIG +#undef LDBL_MANT_DIG +#define FLT_MANT_DIG __FLT_MANT_DIG__ +#define DBL_MANT_DIG __DBL_MANT_DIG__ +#define LDBL_MANT_DIG __LDBL_MANT_DIG__ + +/* Number of decimal digits, q, such that any floating-point number with q + decimal digits can be rounded into a floating-point number with p radix b + digits and back again without change to the q decimal digits, + + p * log10(b) if b is a power of 10 + floor((p - 1) * log10(b)) otherwise +*/ +#undef FLT_DIG +#undef DBL_DIG +#undef LDBL_DIG +#define FLT_DIG __FLT_DIG__ +#define DBL_DIG __DBL_DIG__ +#define LDBL_DIG __LDBL_DIG__ + +/* Minimum int x such that FLT_RADIX**(x-1) is a normalized float, emin */ +#undef FLT_MIN_EXP +#undef DBL_MIN_EXP +#undef LDBL_MIN_EXP +#define FLT_MIN_EXP __FLT_MIN_EXP__ +#define DBL_MIN_EXP __DBL_MIN_EXP__ +#define LDBL_MIN_EXP __LDBL_MIN_EXP__ + +/* Minimum negative integer such that 10 raised to that power is in the + range of normalized floating-point numbers, + + ceil(log10(b) * (emin - 1)) +*/ +#undef FLT_MIN_10_EXP +#undef DBL_MIN_10_EXP +#undef LDBL_MIN_10_EXP +#define FLT_MIN_10_EXP __FLT_MIN_10_EXP__ +#define DBL_MIN_10_EXP __DBL_MIN_10_EXP__ +#define LDBL_MIN_10_EXP __LDBL_MIN_10_EXP__ + +/* Maximum int x such that FLT_RADIX**(x-1) is a representable float, emax. */ +#undef FLT_MAX_EXP +#undef DBL_MAX_EXP +#undef LDBL_MAX_EXP +#define FLT_MAX_EXP __FLT_MAX_EXP__ +#define DBL_MAX_EXP __DBL_MAX_EXP__ +#define LDBL_MAX_EXP __LDBL_MAX_EXP__ + +/* Maximum integer such that 10 raised to that power is in the range of + representable finite floating-point numbers, + + floor(log10((1 - b**-p) * b**emax)) +*/ +#undef FLT_MAX_10_EXP +#undef DBL_MAX_10_EXP +#undef LDBL_MAX_10_EXP +#define FLT_MAX_10_EXP __FLT_MAX_10_EXP__ +#define DBL_MAX_10_EXP __DBL_MAX_10_EXP__ +#define LDBL_MAX_10_EXP __LDBL_MAX_10_EXP__ + +/* Maximum representable finite floating-point number, + + (1 - b**-p) * b**emax +*/ +#undef FLT_MAX +#undef DBL_MAX +#undef LDBL_MAX +#define FLT_MAX __FLT_MAX__ +#define DBL_MAX __DBL_MAX__ +#define LDBL_MAX __LDBL_MAX__ + +/* The difference between 1 and the least value greater than 1 that is + representable in the given floating point type, b**1-p. */ +#undef FLT_EPSILON +#undef DBL_EPSILON +#undef LDBL_EPSILON +#define FLT_EPSILON __FLT_EPSILON__ +#define DBL_EPSILON __DBL_EPSILON__ +#define LDBL_EPSILON __LDBL_EPSILON__ + +/* Minimum normalized positive floating-point number, b**(emin - 1). */ +#undef FLT_MIN +#undef DBL_MIN +#undef LDBL_MIN +#define FLT_MIN __FLT_MIN__ +#define DBL_MIN __DBL_MIN__ +#define LDBL_MIN __LDBL_MIN__ + +/* Addition rounds to 0: zero, 1: nearest, 2: +inf, 3: -inf, -1: unknown. */ +/* ??? This is supposed to change with calls to fesetround in . */ +#undef FLT_ROUNDS +#define FLT_ROUNDS 1 + +#if defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L +/* The floating-point expression evaluation method. + -1 indeterminate + 0 evaluate all operations and constants just to the range and + precision of the type + 1 evaluate operations and constants of type float and double + to the range and precision of the double type, evaluate + long double operations and constants to the range and + precision of the long double type + 2 evaluate all operations and constants to the range and + precision of the long double type + + ??? This ought to change with the setting of the fp control word; + the value provided by the compiler assumes the widest setting. */ +#undef FLT_EVAL_METHOD +#define FLT_EVAL_METHOD __FLT_EVAL_METHOD__ + +/* Number of decimal digits, n, such that any floating-point number in the + widest supported floating type with pmax radix b digits can be rounded + to a floating-point number with n decimal digits and back again without + change to the value, + + pmax * log10(b) if b is a power of 10 + ceil(1 + pmax * log10(b)) otherwise +*/ +#undef DECIMAL_DIG +#define DECIMAL_DIG __DECIMAL_DIG__ + +#endif /* C99 */ + +#if defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L +/* Versions of DECIMAL_DIG for each floating-point type. */ +#undef FLT_DECIMAL_DIG +#undef DBL_DECIMAL_DIG +#undef LDBL_DECIMAL_DIG +#define FLT_DECIMAL_DIG __FLT_DECIMAL_DIG__ +#define DBL_DECIMAL_DIG __DBL_DECIMAL_DIG__ +#define LDBL_DECIMAL_DIG __DECIMAL_DIG__ + +/* Whether types support subnormal numbers. */ +#undef FLT_HAS_SUBNORM +#undef DBL_HAS_SUBNORM +#undef LDBL_HAS_SUBNORM +#define FLT_HAS_SUBNORM __FLT_HAS_DENORM__ +#define DBL_HAS_SUBNORM __DBL_HAS_DENORM__ +#define LDBL_HAS_SUBNORM __LDBL_HAS_DENORM__ + +/* Minimum positive values, including subnormals. */ +#undef FLT_TRUE_MIN +#undef DBL_TRUE_MIN +#undef LDBL_TRUE_MIN +#if __FLT_HAS_DENORM__ +#define FLT_TRUE_MIN __FLT_DENORM_MIN__ +#else +#define FLT_TRUE_MIN __FLT_MIN__ +#endif +#if __DBL_HAS_DENORM__ +#define DBL_TRUE_MIN __DBL_DENORM_MIN__ +#else +#define DBL_TRUE_MIN __DBL_MIN__ +#endif +#if __LDBL_HAS_DENORM__ +#define LDBL_TRUE_MIN __LDBL_DENORM_MIN__ +#else +#define LDBL_TRUE_MIN __LDBL_MIN__ +#endif + +#endif /* C11 */ + +#ifdef __STDC_WANT_DEC_FP__ +/* Draft Technical Report 24732, extension for decimal floating-point + arithmetic: Characteristic of decimal floating types . */ + +/* Number of base-FLT_RADIX digits in the significand, p. */ +#undef DEC32_MANT_DIG +#undef DEC64_MANT_DIG +#undef DEC128_MANT_DIG +#define DEC32_MANT_DIG __DEC32_MANT_DIG__ +#define DEC64_MANT_DIG __DEC64_MANT_DIG__ +#define DEC128_MANT_DIG __DEC128_MANT_DIG__ + +/* Minimum exponent. */ +#undef DEC32_MIN_EXP +#undef DEC64_MIN_EXP +#undef DEC128_MIN_EXP +#define DEC32_MIN_EXP __DEC32_MIN_EXP__ +#define DEC64_MIN_EXP __DEC64_MIN_EXP__ +#define DEC128_MIN_EXP __DEC128_MIN_EXP__ + +/* Maximum exponent. */ +#undef DEC32_MAX_EXP +#undef DEC64_MAX_EXP +#undef DEC128_MAX_EXP +#define DEC32_MAX_EXP __DEC32_MAX_EXP__ +#define DEC64_MAX_EXP __DEC64_MAX_EXP__ +#define DEC128_MAX_EXP __DEC128_MAX_EXP__ + +/* Maximum representable finite decimal floating-point number + (there are 6, 15, and 33 9s after the decimal points respectively). */ +#undef DEC32_MAX +#undef DEC64_MAX +#undef DEC128_MAX +#define DEC32_MAX __DEC32_MAX__ +#define DEC64_MAX __DEC64_MAX__ +#define DEC128_MAX __DEC128_MAX__ + +/* The difference between 1 and the least value greater than 1 that is + representable in the given floating point type. */ +#undef DEC32_EPSILON +#undef DEC64_EPSILON +#undef DEC128_EPSILON +#define DEC32_EPSILON __DEC32_EPSILON__ +#define DEC64_EPSILON __DEC64_EPSILON__ +#define DEC128_EPSILON __DEC128_EPSILON__ + +/* Minimum normalized positive floating-point number. */ +#undef DEC32_MIN +#undef DEC64_MIN +#undef DEC128_MIN +#define DEC32_MIN __DEC32_MIN__ +#define DEC64_MIN __DEC64_MIN__ +#define DEC128_MIN __DEC128_MIN__ + +/* Minimum subnormal positive floating-point number. */ +#undef DEC32_SUBNORMAL_MIN +#undef DEC64_SUBNORMAL_MIN +#undef DEC128_SUBNORMAL_MIN +#define DEC32_SUBNORMAL_MIN __DEC32_SUBNORMAL_MIN__ +#define DEC64_SUBNORMAL_MIN __DEC64_SUBNORMAL_MIN__ +#define DEC128_SUBNORMAL_MIN __DEC128_SUBNORMAL_MIN__ + +/* The floating-point expression evaluation method. + -1 indeterminate + 0 evaluate all operations and constants just to the range and + precision of the type + 1 evaluate operations and constants of type _Decimal32 + and _Decimal64 to the range and precision of the _Decimal64 + type, evaluate _Decimal128 operations and constants to the + range and precision of the _Decimal128 type; + 2 evaluate all operations and constants to the range and + precision of the _Decimal128 type. */ + +#undef DEC_EVAL_METHOD +#define DEC_EVAL_METHOD __DEC_EVAL_METHOD__ + +#endif /* __STDC_WANT_DEC_FP__ */ + +#endif /* _FLOAT_H___ */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/fma4intrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/fma4intrin.h new file mode 100644 index 0000000..e1bdef7 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/fma4intrin.h @@ -0,0 +1,241 @@ +/* Copyright (C) 2007-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _X86INTRIN_H_INCLUDED +# error "Never use directly; include instead." +#endif + +#ifndef _FMA4INTRIN_H_INCLUDED +#define _FMA4INTRIN_H_INCLUDED + +/* We need definitions from the SSE4A, SSE3, SSE2 and SSE header files. */ +#include + +#ifndef __FMA4__ +#pragma GCC push_options +#pragma GCC target("fma4") +#define __DISABLE_FMA4__ +#endif /* __FMA4__ */ + +/* 128b Floating point multiply/add type instructions. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_macc_ps (__m128 __A, __m128 __B, __m128 __C) +{ + return (__m128) __builtin_ia32_vfmaddps ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_macc_pd (__m128d __A, __m128d __B, __m128d __C) +{ + return (__m128d) __builtin_ia32_vfmaddpd ((__v2df)__A, (__v2df)__B, (__v2df)__C); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_macc_ss (__m128 __A, __m128 __B, __m128 __C) +{ + return (__m128) __builtin_ia32_vfmaddss ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_macc_sd (__m128d __A, __m128d __B, __m128d __C) +{ + return (__m128d) __builtin_ia32_vfmaddsd ((__v2df)__A, (__v2df)__B, (__v2df)__C); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_msub_ps (__m128 __A, __m128 __B, __m128 __C) + +{ + return (__m128) __builtin_ia32_vfmaddps ((__v4sf)__A, (__v4sf)__B, -(__v4sf)__C); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_msub_pd (__m128d __A, __m128d __B, __m128d __C) +{ + return (__m128d) __builtin_ia32_vfmaddpd ((__v2df)__A, (__v2df)__B, -(__v2df)__C); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_msub_ss (__m128 __A, __m128 __B, __m128 __C) +{ + return (__m128) __builtin_ia32_vfmaddss ((__v4sf)__A, (__v4sf)__B, -(__v4sf)__C); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_msub_sd (__m128d __A, __m128d __B, __m128d __C) +{ + return (__m128d) __builtin_ia32_vfmaddsd ((__v2df)__A, (__v2df)__B, -(__v2df)__C); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_nmacc_ps (__m128 __A, __m128 __B, __m128 __C) +{ + return (__m128) __builtin_ia32_vfmaddps (-(__v4sf)__A, (__v4sf)__B, (__v4sf)__C); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_nmacc_pd (__m128d __A, __m128d __B, __m128d __C) +{ + return (__m128d) __builtin_ia32_vfmaddpd (-(__v2df)__A, (__v2df)__B, (__v2df)__C); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_nmacc_ss (__m128 __A, __m128 __B, __m128 __C) +{ + return (__m128) __builtin_ia32_vfmaddss (-(__v4sf)__A, (__v4sf)__B, (__v4sf)__C); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_nmacc_sd (__m128d __A, __m128d __B, __m128d __C) +{ + return (__m128d) __builtin_ia32_vfmaddsd (-(__v2df)__A, (__v2df)__B, (__v2df)__C); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_nmsub_ps (__m128 __A, __m128 __B, __m128 __C) +{ + return (__m128) __builtin_ia32_vfmaddps (-(__v4sf)__A, (__v4sf)__B, -(__v4sf)__C); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_nmsub_pd (__m128d __A, __m128d __B, __m128d __C) +{ + return (__m128d) __builtin_ia32_vfmaddpd (-(__v2df)__A, (__v2df)__B, -(__v2df)__C); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_nmsub_ss (__m128 __A, __m128 __B, __m128 __C) +{ + return (__m128) __builtin_ia32_vfmaddss (-(__v4sf)__A, (__v4sf)__B, -(__v4sf)__C); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_nmsub_sd (__m128d __A, __m128d __B, __m128d __C) +{ + return (__m128d) __builtin_ia32_vfmaddsd (-(__v2df)__A, (__v2df)__B, -(__v2df)__C); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maddsub_ps (__m128 __A, __m128 __B, __m128 __C) +{ + return (__m128) __builtin_ia32_vfmaddsubps ((__v4sf)__A, (__v4sf)__B, (__v4sf)__C); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maddsub_pd (__m128d __A, __m128d __B, __m128d __C) +{ + return (__m128d) __builtin_ia32_vfmaddsubpd ((__v2df)__A, (__v2df)__B, (__v2df)__C); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_msubadd_ps (__m128 __A, __m128 __B, __m128 __C) +{ + return (__m128) __builtin_ia32_vfmaddsubps ((__v4sf)__A, (__v4sf)__B, -(__v4sf)__C); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_msubadd_pd (__m128d __A, __m128d __B, __m128d __C) +{ + return (__m128d) __builtin_ia32_vfmaddsubpd ((__v2df)__A, (__v2df)__B, -(__v2df)__C); +} + +/* 256b Floating point multiply/add type instructions. */ +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_macc_ps (__m256 __A, __m256 __B, __m256 __C) +{ + return (__m256) __builtin_ia32_vfmaddps256 ((__v8sf)__A, (__v8sf)__B, (__v8sf)__C); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_macc_pd (__m256d __A, __m256d __B, __m256d __C) +{ + return (__m256d) __builtin_ia32_vfmaddpd256 ((__v4df)__A, (__v4df)__B, (__v4df)__C); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_msub_ps (__m256 __A, __m256 __B, __m256 __C) + +{ + return (__m256) __builtin_ia32_vfmaddps256 ((__v8sf)__A, (__v8sf)__B, -(__v8sf)__C); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_msub_pd (__m256d __A, __m256d __B, __m256d __C) +{ + return (__m256d) __builtin_ia32_vfmaddpd256 ((__v4df)__A, (__v4df)__B, -(__v4df)__C); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_nmacc_ps (__m256 __A, __m256 __B, __m256 __C) +{ + return (__m256) __builtin_ia32_vfmaddps256 (-(__v8sf)__A, (__v8sf)__B, (__v8sf)__C); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_nmacc_pd (__m256d __A, __m256d __B, __m256d __C) +{ + return (__m256d) __builtin_ia32_vfmaddpd256 (-(__v4df)__A, (__v4df)__B, (__v4df)__C); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_nmsub_ps (__m256 __A, __m256 __B, __m256 __C) +{ + return (__m256) __builtin_ia32_vfmaddps256 (-(__v8sf)__A, (__v8sf)__B, -(__v8sf)__C); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_nmsub_pd (__m256d __A, __m256d __B, __m256d __C) +{ + return (__m256d) __builtin_ia32_vfmaddpd256 (-(__v4df)__A, (__v4df)__B, -(__v4df)__C); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maddsub_ps (__m256 __A, __m256 __B, __m256 __C) +{ + return (__m256) __builtin_ia32_vfmaddsubps256 ((__v8sf)__A, (__v8sf)__B, (__v8sf)__C); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_maddsub_pd (__m256d __A, __m256d __B, __m256d __C) +{ + return (__m256d) __builtin_ia32_vfmaddsubpd256 ((__v4df)__A, (__v4df)__B, (__v4df)__C); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_msubadd_ps (__m256 __A, __m256 __B, __m256 __C) +{ + return (__m256) __builtin_ia32_vfmaddsubps256 ((__v8sf)__A, (__v8sf)__B, -(__v8sf)__C); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_msubadd_pd (__m256d __A, __m256d __B, __m256d __C) +{ + return (__m256d) __builtin_ia32_vfmaddsubpd256 ((__v4df)__A, (__v4df)__B, -(__v4df)__C); +} + +#ifdef __DISABLE_FMA4__ +#undef __DISABLE_FMA4__ +#pragma GCC pop_options +#endif /* __DISABLE_FMA4__ */ + +#endif diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/fmaintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/fmaintrin.h new file mode 100644 index 0000000..bfbb75d --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/fmaintrin.h @@ -0,0 +1,302 @@ +/* Copyright (C) 2011-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _IMMINTRIN_H_INCLUDED +# error "Never use directly; include instead." +#endif + +#ifndef _FMAINTRIN_H_INCLUDED +#define _FMAINTRIN_H_INCLUDED + +#ifndef __FMA__ +#pragma GCC push_options +#pragma GCC target("fma") +#define __DISABLE_FMA__ +#endif /* __FMA__ */ + +extern __inline __m128d +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmadd_pd (__m128d __A, __m128d __B, __m128d __C) +{ + return (__m128d)__builtin_ia32_vfmaddpd ((__v2df)__A, (__v2df)__B, + (__v2df)__C); +} + +extern __inline __m256d +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fmadd_pd (__m256d __A, __m256d __B, __m256d __C) +{ + return (__m256d)__builtin_ia32_vfmaddpd256 ((__v4df)__A, (__v4df)__B, + (__v4df)__C); +} + +extern __inline __m128 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmadd_ps (__m128 __A, __m128 __B, __m128 __C) +{ + return (__m128)__builtin_ia32_vfmaddps ((__v4sf)__A, (__v4sf)__B, + (__v4sf)__C); +} + +extern __inline __m256 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fmadd_ps (__m256 __A, __m256 __B, __m256 __C) +{ + return (__m256)__builtin_ia32_vfmaddps256 ((__v8sf)__A, (__v8sf)__B, + (__v8sf)__C); +} + +extern __inline __m128d +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmadd_sd (__m128d __A, __m128d __B, __m128d __C) +{ + return (__m128d) __builtin_ia32_vfmaddsd3 ((__v2df)__A, (__v2df)__B, + (__v2df)__C); +} + +extern __inline __m128 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmadd_ss (__m128 __A, __m128 __B, __m128 __C) +{ + return (__m128) __builtin_ia32_vfmaddss3 ((__v4sf)__A, (__v4sf)__B, + (__v4sf)__C); +} + +extern __inline __m128d +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmsub_pd (__m128d __A, __m128d __B, __m128d __C) +{ + return (__m128d)__builtin_ia32_vfmaddpd ((__v2df)__A, (__v2df)__B, + -(__v2df)__C); +} + +extern __inline __m256d +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fmsub_pd (__m256d __A, __m256d __B, __m256d __C) +{ + return (__m256d)__builtin_ia32_vfmaddpd256 ((__v4df)__A, (__v4df)__B, + -(__v4df)__C); +} + +extern __inline __m128 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmsub_ps (__m128 __A, __m128 __B, __m128 __C) +{ + return (__m128)__builtin_ia32_vfmaddps ((__v4sf)__A, (__v4sf)__B, + -(__v4sf)__C); +} + +extern __inline __m256 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fmsub_ps (__m256 __A, __m256 __B, __m256 __C) +{ + return (__m256)__builtin_ia32_vfmaddps256 ((__v8sf)__A, (__v8sf)__B, + -(__v8sf)__C); +} + +extern __inline __m128d +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmsub_sd (__m128d __A, __m128d __B, __m128d __C) +{ + return (__m128d)__builtin_ia32_vfmaddsd3 ((__v2df)__A, (__v2df)__B, + -(__v2df)__C); +} + +extern __inline __m128 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmsub_ss (__m128 __A, __m128 __B, __m128 __C) +{ + return (__m128)__builtin_ia32_vfmaddss3 ((__v4sf)__A, (__v4sf)__B, + -(__v4sf)__C); +} + +extern __inline __m128d +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmadd_pd (__m128d __A, __m128d __B, __m128d __C) +{ + return (__m128d)__builtin_ia32_vfmaddpd (-(__v2df)__A, (__v2df)__B, + (__v2df)__C); +} + +extern __inline __m256d +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fnmadd_pd (__m256d __A, __m256d __B, __m256d __C) +{ + return (__m256d)__builtin_ia32_vfmaddpd256 (-(__v4df)__A, (__v4df)__B, + (__v4df)__C); +} + +extern __inline __m128 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmadd_ps (__m128 __A, __m128 __B, __m128 __C) +{ + return (__m128)__builtin_ia32_vfmaddps (-(__v4sf)__A, (__v4sf)__B, + (__v4sf)__C); +} + +extern __inline __m256 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fnmadd_ps (__m256 __A, __m256 __B, __m256 __C) +{ + return (__m256)__builtin_ia32_vfmaddps256 (-(__v8sf)__A, (__v8sf)__B, + (__v8sf)__C); +} + +extern __inline __m128d +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmadd_sd (__m128d __A, __m128d __B, __m128d __C) +{ + return (__m128d)__builtin_ia32_vfmaddsd3 ((__v2df)__A, -(__v2df)__B, + (__v2df)__C); +} + +extern __inline __m128 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmadd_ss (__m128 __A, __m128 __B, __m128 __C) +{ + return (__m128)__builtin_ia32_vfmaddss3 ((__v4sf)__A, -(__v4sf)__B, + (__v4sf)__C); +} + +extern __inline __m128d +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmsub_pd (__m128d __A, __m128d __B, __m128d __C) +{ + return (__m128d)__builtin_ia32_vfmaddpd (-(__v2df)__A, (__v2df)__B, + -(__v2df)__C); +} + +extern __inline __m256d +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fnmsub_pd (__m256d __A, __m256d __B, __m256d __C) +{ + return (__m256d)__builtin_ia32_vfmaddpd256 (-(__v4df)__A, (__v4df)__B, + -(__v4df)__C); +} + +extern __inline __m128 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmsub_ps (__m128 __A, __m128 __B, __m128 __C) +{ + return (__m128)__builtin_ia32_vfmaddps (-(__v4sf)__A, (__v4sf)__B, + -(__v4sf)__C); +} + +extern __inline __m256 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fnmsub_ps (__m256 __A, __m256 __B, __m256 __C) +{ + return (__m256)__builtin_ia32_vfmaddps256 (-(__v8sf)__A, (__v8sf)__B, + -(__v8sf)__C); +} + +extern __inline __m128d +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmsub_sd (__m128d __A, __m128d __B, __m128d __C) +{ + return (__m128d)__builtin_ia32_vfmaddsd3 ((__v2df)__A, -(__v2df)__B, + -(__v2df)__C); +} + +extern __inline __m128 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fnmsub_ss (__m128 __A, __m128 __B, __m128 __C) +{ + return (__m128)__builtin_ia32_vfmaddss3 ((__v4sf)__A, -(__v4sf)__B, + -(__v4sf)__C); +} + +extern __inline __m128d +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmaddsub_pd (__m128d __A, __m128d __B, __m128d __C) +{ + return (__m128d)__builtin_ia32_vfmaddsubpd ((__v2df)__A, (__v2df)__B, + (__v2df)__C); +} + +extern __inline __m256d +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fmaddsub_pd (__m256d __A, __m256d __B, __m256d __C) +{ + return (__m256d)__builtin_ia32_vfmaddsubpd256 ((__v4df)__A, + (__v4df)__B, + (__v4df)__C); +} + +extern __inline __m128 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmaddsub_ps (__m128 __A, __m128 __B, __m128 __C) +{ + return (__m128)__builtin_ia32_vfmaddsubps ((__v4sf)__A, (__v4sf)__B, + (__v4sf)__C); +} + +extern __inline __m256 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fmaddsub_ps (__m256 __A, __m256 __B, __m256 __C) +{ + return (__m256)__builtin_ia32_vfmaddsubps256 ((__v8sf)__A, + (__v8sf)__B, + (__v8sf)__C); +} + +extern __inline __m128d +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmsubadd_pd (__m128d __A, __m128d __B, __m128d __C) +{ + return (__m128d)__builtin_ia32_vfmaddsubpd ((__v2df)__A, (__v2df)__B, + -(__v2df)__C); +} + +extern __inline __m256d +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fmsubadd_pd (__m256d __A, __m256d __B, __m256d __C) +{ + return (__m256d)__builtin_ia32_vfmaddsubpd256 ((__v4df)__A, + (__v4df)__B, + -(__v4df)__C); +} + +extern __inline __m128 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_fmsubadd_ps (__m128 __A, __m128 __B, __m128 __C) +{ + return (__m128)__builtin_ia32_vfmaddsubps ((__v4sf)__A, (__v4sf)__B, + -(__v4sf)__C); +} + +extern __inline __m256 +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_fmsubadd_ps (__m256 __A, __m256 __B, __m256 __C) +{ + return (__m256)__builtin_ia32_vfmaddsubps256 ((__v8sf)__A, + (__v8sf)__B, + -(__v8sf)__C); +} + +#ifdef __DISABLE_FMA__ +#undef __DISABLE_FMA__ +#pragma GCC pop_options +#endif /* __DISABLE_FMA__ */ + +#endif diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/fxsrintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/fxsrintrin.h new file mode 100644 index 0000000..98e73ee --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/fxsrintrin.h @@ -0,0 +1,73 @@ +/* Copyright (C) 2012-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* #if !defined _X86INTRIN_H_INCLUDED && !defined _IMMINTRIN_H_INCLUDED */ +/* # error "Never use directly; include instead." */ +/* #endif */ + +#ifndef _FXSRINTRIN_H_INCLUDED +#define _FXSRINTRIN_H_INCLUDED + +#ifndef __FXSR__ +#pragma GCC push_options +#pragma GCC target("fxsr") +#define __DISABLE_FXSR__ +#endif /* __FXSR__ */ + +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_fxsave (void *__P) +{ + return __builtin_ia32_fxsave (__P); +} + +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_fxrstor (void *__P) +{ + return __builtin_ia32_fxrstor (__P); +} + +#ifdef __x86_64__ +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_fxsave64 (void *__P) +{ + return __builtin_ia32_fxsave64 (__P); +} + +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_fxrstor64 (void *__P) +{ + return __builtin_ia32_fxrstor64 (__P); +} +#endif + +#ifdef __DISABLE_FXSR__ +#undef __DISABLE_FXSR__ +#pragma GCC pop_options +#endif /* __DISABLE_FXSR__ */ + + +#endif /* _FXSRINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/ia32intrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/ia32intrin.h new file mode 100644 index 0000000..614b0fa --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/ia32intrin.h @@ -0,0 +1,299 @@ +/* Copyright (C) 2009-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _X86INTRIN_H_INCLUDED +# error "Never use directly; include instead." +#endif + +/* 32bit bsf */ +extern __inline int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__bsfd (int __X) +{ + return __builtin_ctz (__X); +} + +/* 32bit bsr */ +extern __inline int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__bsrd (int __X) +{ + return __builtin_ia32_bsrsi (__X); +} + +/* 32bit bswap */ +extern __inline int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__bswapd (int __X) +{ + return __builtin_bswap32 (__X); +} + +#ifndef __SSE4_2__ +#pragma GCC push_options +#pragma GCC target("sse4.2") +#define __DISABLE_SSE4_2__ +#endif /* __SSE4_2__ */ + +/* 32bit accumulate CRC32 (polynomial 0x11EDC6F41) value. */ +extern __inline unsigned int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__crc32b (unsigned int __C, unsigned char __V) +{ + return __builtin_ia32_crc32qi (__C, __V); +} + +extern __inline unsigned int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__crc32w (unsigned int __C, unsigned short __V) +{ + return __builtin_ia32_crc32hi (__C, __V); +} + +extern __inline unsigned int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__crc32d (unsigned int __C, unsigned int __V) +{ + return __builtin_ia32_crc32si (__C, __V); +} + +#ifdef __DISABLE_SSE4_2__ +#undef __DISABLE_SSE4_2__ +#pragma GCC pop_options +#endif /* __DISABLE_SSE4_2__ */ + +/* 32bit popcnt */ +extern __inline int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__popcntd (unsigned int __X) +{ + return __builtin_popcount (__X); +} + +/* rdpmc */ +extern __inline unsigned long long +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__rdpmc (int __S) +{ + return __builtin_ia32_rdpmc (__S); +} + +/* rdtsc */ +extern __inline unsigned long long +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__rdtsc (void) +{ + return __builtin_ia32_rdtsc (); +} + +/* rdtscp */ +extern __inline unsigned long long +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__rdtscp (unsigned int *__A) +{ + return __builtin_ia32_rdtscp (__A); +} + +/* 8bit rol */ +extern __inline unsigned char +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__rolb (unsigned char __X, int __C) +{ + return __builtin_ia32_rolqi (__X, __C); +} + +/* 16bit rol */ +extern __inline unsigned short +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__rolw (unsigned short __X, int __C) +{ + return __builtin_ia32_rolhi (__X, __C); +} + +/* 32bit rol */ +extern __inline unsigned int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__rold (unsigned int __X, int __C) +{ + return (__X << __C) | (__X >> (32 - __C)); +} + +/* 8bit ror */ +extern __inline unsigned char +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__rorb (unsigned char __X, int __C) +{ + return __builtin_ia32_rorqi (__X, __C); +} + +/* 16bit ror */ +extern __inline unsigned short +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__rorw (unsigned short __X, int __C) +{ + return __builtin_ia32_rorhi (__X, __C); +} + +/* 32bit ror */ +extern __inline unsigned int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__rord (unsigned int __X, int __C) +{ + return (__X >> __C) | (__X << (32 - __C)); +} + +/* Pause */ +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__pause (void) +{ + __builtin_ia32_pause (); +} + +#ifdef __x86_64__ +/* 64bit bsf */ +extern __inline int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__bsfq (long long __X) +{ + return __builtin_ctzll (__X); +} + +/* 64bit bsr */ +extern __inline int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__bsrq (long long __X) +{ + return __builtin_ia32_bsrdi (__X); +} + +/* 64bit bswap */ +extern __inline long long +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__bswapq (long long __X) +{ + return __builtin_bswap64 (__X); +} + +#ifndef __SSE4_2__ +#pragma GCC push_options +#pragma GCC target("sse4.2") +#define __DISABLE_SSE4_2__ +#endif /* __SSE4_2__ */ + +/* 64bit accumulate CRC32 (polynomial 0x11EDC6F41) value. */ +extern __inline unsigned long long +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__crc32q (unsigned long long __C, unsigned long long __V) +{ + return __builtin_ia32_crc32di (__C, __V); +} + +#ifdef __DISABLE_SSE4_2__ +#undef __DISABLE_SSE4_2__ +#pragma GCC pop_options +#endif /* __DISABLE_SSE4_2__ */ + +/* 64bit popcnt */ +extern __inline long long +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__popcntq (unsigned long long __X) +{ + return __builtin_popcountll (__X); +} + +/* 64bit rol */ +extern __inline unsigned long long +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__rolq (unsigned long long __X, int __C) +{ + return (__X << __C) | (__X >> (64 - __C)); +} + +/* 64bit ror */ +extern __inline unsigned long long +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__rorq (unsigned long long __X, int __C) +{ + return (__X >> __C) | (__X << (64 - __C)); +} + +/* Read flags register */ +extern __inline unsigned long long +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__readeflags (void) +{ + return __builtin_ia32_readeflags_u64 (); +} + +/* Write flags register */ +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__writeeflags (unsigned long long X) +{ + __builtin_ia32_writeeflags_u64 (X); +} + +#define _bswap64(a) __bswapq(a) +#define _popcnt64(a) __popcntq(a) +#else + +/* Read flags register */ +extern __inline unsigned int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__readeflags (void) +{ + return __builtin_ia32_readeflags_u32 (); +} + +/* Write flags register */ +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__writeeflags (unsigned int X) +{ + __builtin_ia32_writeeflags_u32 (X); +} + +#endif + +/* On LP64 systems, longs are 64-bit. Use the appropriate rotate + * function. */ +#ifdef __LP64__ +#define _lrotl(a,b) __rolq((a), (b)) +#define _lrotr(a,b) __rorq((a), (b)) +#else +#define _lrotl(a,b) __rold((a), (b)) +#define _lrotr(a,b) __rord((a), (b)) +#endif + +#define _bit_scan_forward(a) __bsfd(a) +#define _bit_scan_reverse(a) __bsrd(a) +#define _bswap(a) __bswapd(a) +#define _popcnt32(a) __popcntd(a) +#define _rdpmc(a) __rdpmc(a) +#define _rdtsc() __rdtsc() +#define _rdtscp(a) __rdtscp(a) +#define _rotwl(a,b) __rolw((a), (b)) +#define _rotwr(a,b) __rorw((a), (b)) +#define _rotl(a,b) __rold((a), (b)) +#define _rotr(a,b) __rord((a), (b)) diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/immintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/immintrin.h new file mode 100644 index 0000000..73b4859 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/immintrin.h @@ -0,0 +1,177 @@ +/* Copyright (C) 2008-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _IMMINTRIN_H_INCLUDED +#define _IMMINTRIN_H_INCLUDED + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#ifndef __RDRND__ +#pragma GCC push_options +#pragma GCC target("rdrnd") +#define __DISABLE_RDRND__ +#endif /* __RDRND__ */ +extern __inline int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_rdrand16_step (unsigned short *__P) +{ + return __builtin_ia32_rdrand16_step (__P); +} + +extern __inline int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_rdrand32_step (unsigned int *__P) +{ + return __builtin_ia32_rdrand32_step (__P); +} +#ifdef __DISABLE_RDRND__ +#undef __DISABLE_RDRND__ +#pragma GCC pop_options +#endif /* __DISABLE_RDRND__ */ + +#ifdef __x86_64__ + +#ifndef __FSGSBASE__ +#pragma GCC push_options +#pragma GCC target("fsgsbase") +#define __DISABLE_FSGSBASE__ +#endif /* __FSGSBASE__ */ +extern __inline unsigned int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_readfsbase_u32 (void) +{ + return __builtin_ia32_rdfsbase32 (); +} + +extern __inline unsigned long long +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_readfsbase_u64 (void) +{ + return __builtin_ia32_rdfsbase64 (); +} + +extern __inline unsigned int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_readgsbase_u32 (void) +{ + return __builtin_ia32_rdgsbase32 (); +} + +extern __inline unsigned long long +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_readgsbase_u64 (void) +{ + return __builtin_ia32_rdgsbase64 (); +} + +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_writefsbase_u32 (unsigned int __B) +{ + __builtin_ia32_wrfsbase32 (__B); +} + +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_writefsbase_u64 (unsigned long long __B) +{ + __builtin_ia32_wrfsbase64 (__B); +} + +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_writegsbase_u32 (unsigned int __B) +{ + __builtin_ia32_wrgsbase32 (__B); +} + +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_writegsbase_u64 (unsigned long long __B) +{ + __builtin_ia32_wrgsbase64 (__B); +} +#ifdef __DISABLE_FSGSBASE__ +#undef __DISABLE_FSGSBASE__ +#pragma GCC pop_options +#endif /* __DISABLE_FSGSBASE__ */ + +#ifndef __RDRND__ +#pragma GCC push_options +#pragma GCC target("rdrnd") +#define __DISABLE_RDRND__ +#endif /* __RDRND__ */ +extern __inline int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_rdrand64_step (unsigned long long *__P) +{ + return __builtin_ia32_rdrand64_step (__P); +} +#ifdef __DISABLE_RDRND__ +#undef __DISABLE_RDRND__ +#pragma GCC pop_options +#endif /* __DISABLE_RDRND__ */ + +#endif /* __x86_64__ */ + +#endif /* _IMMINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/iso646.h b/lib/gcc/x86_64-linux-android/4.9.x/include/iso646.h new file mode 100644 index 0000000..89bc8f4 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/iso646.h @@ -0,0 +1,45 @@ +/* Copyright (C) 1997-2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +. */ + +/* + * ISO C Standard: 7.9 Alternative spellings + */ + +#ifndef _ISO646_H +#define _ISO646_H + +#ifndef __cplusplus +#define and && +#define and_eq &= +#define bitand & +#define bitor | +#define compl ~ +#define not ! +#define not_eq != +#define or || +#define or_eq |= +#define xor ^ +#define xor_eq ^= +#endif + +#endif diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/lwpintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/lwpintrin.h new file mode 100644 index 0000000..1cd046a --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/lwpintrin.h @@ -0,0 +1,105 @@ +/* Copyright (C) 2007-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _X86INTRIN_H_INCLUDED +# error "Never use directly; include instead." +#endif + +#ifndef _LWPINTRIN_H_INCLUDED +#define _LWPINTRIN_H_INCLUDED + +#ifndef __LWP__ +#pragma GCC push_options +#pragma GCC target("lwp") +#define __DISABLE_LWP__ +#endif /* __LWP__ */ + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__llwpcb (void *pcbAddress) +{ + __builtin_ia32_llwpcb (pcbAddress); +} + +extern __inline void * __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__slwpcb (void) +{ + return __builtin_ia32_slwpcb (); +} + +#ifdef __OPTIMIZE__ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__lwpval32 (unsigned int data2, unsigned int data1, unsigned int flags) +{ + __builtin_ia32_lwpval32 (data2, data1, flags); +} + +#ifdef __x86_64__ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__lwpval64 (unsigned long long data2, unsigned int data1, unsigned int flags) +{ + __builtin_ia32_lwpval64 (data2, data1, flags); +} +#endif +#else +#define __lwpval32(D2, D1, F) \ + (__builtin_ia32_lwpval32 ((unsigned int) (D2), (unsigned int) (D1), \ + (unsigned int) (F))) +#ifdef __x86_64__ +#define __lwpval64(D2, D1, F) \ + (__builtin_ia32_lwpval64 ((unsigned long long) (D2), (unsigned int) (D1), \ + (unsigned int) (F))) +#endif +#endif + + +#ifdef __OPTIMIZE__ +extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__lwpins32 (unsigned int data2, unsigned int data1, unsigned int flags) +{ + return __builtin_ia32_lwpins32 (data2, data1, flags); +} + +#ifdef __x86_64__ +extern __inline unsigned char __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__lwpins64 (unsigned long long data2, unsigned int data1, unsigned int flags) +{ + return __builtin_ia32_lwpins64 (data2, data1, flags); +} +#endif +#else +#define __lwpins32(D2, D1, F) \ + (__builtin_ia32_lwpins32 ((unsigned int) (D2), (unsigned int) (D1), \ + (unsigned int) (F))) +#ifdef __x86_64__ +#define __lwpins64(D2, D1, F) \ + (__builtin_ia32_lwpins64 ((unsigned long long) (D2), (unsigned int) (D1), \ + (unsigned int) (F))) +#endif +#endif + +#ifdef __DISABLE_LWP__ +#undef __DISABLE_LWP__ +#pragma GCC pop_options +#endif /* __DISABLE_LWP__ */ + +#endif /* _LWPINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/lzcntintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/lzcntintrin.h new file mode 100644 index 0000000..b680a35 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/lzcntintrin.h @@ -0,0 +1,75 @@ +/* Copyright (C) 2009-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#if !defined _X86INTRIN_H_INCLUDED && !defined _IMMINTRIN_H_INCLUDED +# error "Never use directly; include instead." +#endif + + +#ifndef _LZCNTINTRIN_H_INCLUDED +#define _LZCNTINTRIN_H_INCLUDED + +#ifndef __LZCNT__ +#pragma GCC push_options +#pragma GCC target("lzcnt") +#define __DISABLE_LZCNT__ +#endif /* __LZCNT__ */ + +extern __inline unsigned short __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__lzcnt16 (unsigned short __X) +{ + return __builtin_clzs (__X); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__lzcnt32 (unsigned int __X) +{ + return __builtin_clz (__X); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_lzcnt_u32 (unsigned int __X) +{ + return __builtin_clz (__X); +} + +#ifdef __x86_64__ +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__lzcnt64 (unsigned long long __X) +{ + return __builtin_clzll (__X); +} + +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_lzcnt_u64 (unsigned long long __X) +{ + return __builtin_clzll (__X); +} +#endif + +#ifdef __DISABLE_LZCNT__ +#undef __DISABLE_LZCNT__ +#pragma GCC pop_options +#endif /* __DISABLE_LZCNT__ */ + +#endif /* _LZCNTINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/mm3dnow.h b/lib/gcc/x86_64-linux-android/4.9.x/include/mm3dnow.h new file mode 100644 index 0000000..bf847f9 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/mm3dnow.h @@ -0,0 +1,218 @@ +/* Copyright (C) 2004-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* Implemented from the mm3dnow.h (of supposedly AMD origin) included with + MSVC 7.1. */ + +#ifndef _MM3DNOW_H_INCLUDED +#define _MM3DNOW_H_INCLUDED + +#include +#include + +#ifndef __3dNOW__ +#pragma GCC push_options +#pragma GCC target("3dnow") +#define __DISABLE_3dNOW__ +#endif /* __3dNOW__ */ + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_femms (void) +{ + __builtin_ia32_femms(); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pavgusb (__m64 __A, __m64 __B) +{ + return (__m64)__builtin_ia32_pavgusb ((__v8qi)__A, (__v8qi)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pf2id (__m64 __A) +{ + return (__m64)__builtin_ia32_pf2id ((__v2sf)__A); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pfacc (__m64 __A, __m64 __B) +{ + return (__m64)__builtin_ia32_pfacc ((__v2sf)__A, (__v2sf)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pfadd (__m64 __A, __m64 __B) +{ + return (__m64)__builtin_ia32_pfadd ((__v2sf)__A, (__v2sf)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pfcmpeq (__m64 __A, __m64 __B) +{ + return (__m64)__builtin_ia32_pfcmpeq ((__v2sf)__A, (__v2sf)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pfcmpge (__m64 __A, __m64 __B) +{ + return (__m64)__builtin_ia32_pfcmpge ((__v2sf)__A, (__v2sf)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pfcmpgt (__m64 __A, __m64 __B) +{ + return (__m64)__builtin_ia32_pfcmpgt ((__v2sf)__A, (__v2sf)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pfmax (__m64 __A, __m64 __B) +{ + return (__m64)__builtin_ia32_pfmax ((__v2sf)__A, (__v2sf)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pfmin (__m64 __A, __m64 __B) +{ + return (__m64)__builtin_ia32_pfmin ((__v2sf)__A, (__v2sf)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pfmul (__m64 __A, __m64 __B) +{ + return (__m64)__builtin_ia32_pfmul ((__v2sf)__A, (__v2sf)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pfrcp (__m64 __A) +{ + return (__m64)__builtin_ia32_pfrcp ((__v2sf)__A); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pfrcpit1 (__m64 __A, __m64 __B) +{ + return (__m64)__builtin_ia32_pfrcpit1 ((__v2sf)__A, (__v2sf)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pfrcpit2 (__m64 __A, __m64 __B) +{ + return (__m64)__builtin_ia32_pfrcpit2 ((__v2sf)__A, (__v2sf)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pfrsqrt (__m64 __A) +{ + return (__m64)__builtin_ia32_pfrsqrt ((__v2sf)__A); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pfrsqit1 (__m64 __A, __m64 __B) +{ + return (__m64)__builtin_ia32_pfrsqit1 ((__v2sf)__A, (__v2sf)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pfsub (__m64 __A, __m64 __B) +{ + return (__m64)__builtin_ia32_pfsub ((__v2sf)__A, (__v2sf)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pfsubr (__m64 __A, __m64 __B) +{ + return (__m64)__builtin_ia32_pfsubr ((__v2sf)__A, (__v2sf)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pi2fd (__m64 __A) +{ + return (__m64)__builtin_ia32_pi2fd ((__v2si)__A); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pmulhrw (__m64 __A, __m64 __B) +{ + return (__m64)__builtin_ia32_pmulhrw ((__v4hi)__A, (__v4hi)__B); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_prefetch (void *__P) +{ + __builtin_prefetch (__P, 0, 3 /* _MM_HINT_T0 */); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_from_float (float __A) +{ + return __extension__ (__m64)(__v2sf){ __A, 0.0f }; +} + +extern __inline float __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_to_float (__m64 __A) +{ + union { __v2sf v; float a[2]; } __tmp; + __tmp.v = (__v2sf)__A; + return __tmp.a[0]; +} + +#ifdef __3dNOW_A__ + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pf2iw (__m64 __A) +{ + return (__m64)__builtin_ia32_pf2iw ((__v2sf)__A); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pfnacc (__m64 __A, __m64 __B) +{ + return (__m64)__builtin_ia32_pfnacc ((__v2sf)__A, (__v2sf)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pfpnacc (__m64 __A, __m64 __B) +{ + return (__m64)__builtin_ia32_pfpnacc ((__v2sf)__A, (__v2sf)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pi2fw (__m64 __A) +{ + return (__m64)__builtin_ia32_pi2fw ((__v2si)__A); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pswapd (__m64 __A) +{ + return (__m64)__builtin_ia32_pswapdsf ((__v2sf)__A); +} + +#endif /* __3dNOW_A__ */ + +#ifdef __DISABLE_3dNOW__ +#undef __DISABLE_3dNOW__ +#pragma GCC pop_options +#endif /* __DISABLE_3dNOW__ */ + +#endif /* _MM3DNOW_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/mm_malloc.h b/lib/gcc/x86_64-linux-android/4.9.x/include/mm_malloc.h new file mode 100644 index 0000000..4506dbc --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/mm_malloc.h @@ -0,0 +1,67 @@ +/* Copyright (C) 2004-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _MM_MALLOC_H_INCLUDED +#define _MM_MALLOC_H_INCLUDED + +#include + +#if !defined(__ANDROID__) || defined(HAVE_POSIX_MEMALIGN) +/* We can't depend on since the prototype of posix_memalign + may not be visible. */ +#ifndef __cplusplus +extern int posix_memalign (void **, size_t, size_t); +#else +#ifdef __GLIBC__ +extern "C" int posix_memalign (void **, size_t, size_t) throw (); +#else +extern "C" int posix_memalign (void **, size_t, size_t); +#endif /* __GLIBC__ */ +#endif /* __cplusplus */ +#endif /* !(__ANDROID__) || (HAVE_POSIX_MEMALIGN) */ + +static __inline void * +_mm_malloc (size_t size, size_t alignment) +{ + void *ptr; + if (alignment == 1) + return malloc (size); + if (alignment == 2 || (sizeof (void *) == 8 && alignment == 4)) + alignment = sizeof (void *); +#if !defined(__ANDROID__) || defined(HAVE_POSIX_MEMALIGN) + if (posix_memalign (&ptr, alignment, size) == 0) + return ptr; + else + return NULL; +#else + return memalign(alignment, size); +#endif +} + +static __inline void +_mm_free (void * ptr) +{ + free (ptr); +} + +#endif /* _MM_MALLOC_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/mmintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/mmintrin.h new file mode 100644 index 0000000..b351200 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/mmintrin.h @@ -0,0 +1,942 @@ +/* Copyright (C) 2002-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* Implemented from the specification included in the Intel C++ Compiler + User Guide and Reference, version 9.0. */ + +#ifndef _MMINTRIN_H_INCLUDED +#define _MMINTRIN_H_INCLUDED + +#ifndef __MMX__ +#pragma GCC push_options +#pragma GCC target("mmx") +#define __DISABLE_MMX__ +#endif /* __MMX__ */ + +/* The Intel API is flexible enough that we must allow aliasing with other + vector types, and their scalar components. */ +typedef int __m64 __attribute__ ((__vector_size__ (8), __may_alias__)); + +/* Internal data types for implementing the intrinsics. */ +typedef int __v2si __attribute__ ((__vector_size__ (8))); +typedef short __v4hi __attribute__ ((__vector_size__ (8))); +typedef char __v8qi __attribute__ ((__vector_size__ (8))); +typedef long long __v1di __attribute__ ((__vector_size__ (8))); +typedef float __v2sf __attribute__ ((__vector_size__ (8))); + +/* Empty the multimedia state. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_empty (void) +{ + __builtin_ia32_emms (); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_empty (void) +{ + _mm_empty (); +} + +/* Convert I to a __m64 object. The integer is zero-extended to 64-bits. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi32_si64 (int __i) +{ + return (__m64) __builtin_ia32_vec_init_v2si (__i, 0); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_from_int (int __i) +{ + return _mm_cvtsi32_si64 (__i); +} + +#ifdef __x86_64__ +/* Convert I to a __m64 object. */ + +/* Intel intrinsic. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_from_int64 (long long __i) +{ + return (__m64) __i; +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi64_m64 (long long __i) +{ + return (__m64) __i; +} + +/* Microsoft intrinsic. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi64x_si64 (long long __i) +{ + return (__m64) __i; +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_pi64x (long long __i) +{ + return (__m64) __i; +} +#endif + +/* Convert the lower 32 bits of the __m64 object into an integer. */ +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi64_si32 (__m64 __i) +{ + return __builtin_ia32_vec_ext_v2si ((__v2si)__i, 0); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_to_int (__m64 __i) +{ + return _mm_cvtsi64_si32 (__i); +} + +#ifdef __x86_64__ +/* Convert the __m64 object to a 64bit integer. */ + +/* Intel intrinsic. */ +extern __inline long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_to_int64 (__m64 __i) +{ + return (long long)__i; +} + +extern __inline long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtm64_si64 (__m64 __i) +{ + return (long long)__i; +} + +/* Microsoft intrinsic. */ +extern __inline long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi64_si64x (__m64 __i) +{ + return (long long)__i; +} +#endif + +/* Pack the four 16-bit values from M1 into the lower four 8-bit values of + the result, and the four 16-bit values from M2 into the upper four 8-bit + values of the result, all with signed saturation. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_packs_pi16 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_packsswb ((__v4hi)__m1, (__v4hi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_packsswb (__m64 __m1, __m64 __m2) +{ + return _mm_packs_pi16 (__m1, __m2); +} + +/* Pack the two 32-bit values from M1 in to the lower two 16-bit values of + the result, and the two 32-bit values from M2 into the upper two 16-bit + values of the result, all with signed saturation. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_packs_pi32 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_packssdw ((__v2si)__m1, (__v2si)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_packssdw (__m64 __m1, __m64 __m2) +{ + return _mm_packs_pi32 (__m1, __m2); +} + +/* Pack the four 16-bit values from M1 into the lower four 8-bit values of + the result, and the four 16-bit values from M2 into the upper four 8-bit + values of the result, all with unsigned saturation. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_packs_pu16 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_packuswb ((__v4hi)__m1, (__v4hi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_packuswb (__m64 __m1, __m64 __m2) +{ + return _mm_packs_pu16 (__m1, __m2); +} + +/* Interleave the four 8-bit values from the high half of M1 with the four + 8-bit values from the high half of M2. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_unpackhi_pi8 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_punpckhbw ((__v8qi)__m1, (__v8qi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_punpckhbw (__m64 __m1, __m64 __m2) +{ + return _mm_unpackhi_pi8 (__m1, __m2); +} + +/* Interleave the two 16-bit values from the high half of M1 with the two + 16-bit values from the high half of M2. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_unpackhi_pi16 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_punpckhwd ((__v4hi)__m1, (__v4hi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_punpckhwd (__m64 __m1, __m64 __m2) +{ + return _mm_unpackhi_pi16 (__m1, __m2); +} + +/* Interleave the 32-bit value from the high half of M1 with the 32-bit + value from the high half of M2. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_unpackhi_pi32 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_punpckhdq ((__v2si)__m1, (__v2si)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_punpckhdq (__m64 __m1, __m64 __m2) +{ + return _mm_unpackhi_pi32 (__m1, __m2); +} + +/* Interleave the four 8-bit values from the low half of M1 with the four + 8-bit values from the low half of M2. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_unpacklo_pi8 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_punpcklbw ((__v8qi)__m1, (__v8qi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_punpcklbw (__m64 __m1, __m64 __m2) +{ + return _mm_unpacklo_pi8 (__m1, __m2); +} + +/* Interleave the two 16-bit values from the low half of M1 with the two + 16-bit values from the low half of M2. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_unpacklo_pi16 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_punpcklwd ((__v4hi)__m1, (__v4hi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_punpcklwd (__m64 __m1, __m64 __m2) +{ + return _mm_unpacklo_pi16 (__m1, __m2); +} + +/* Interleave the 32-bit value from the low half of M1 with the 32-bit + value from the low half of M2. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_unpacklo_pi32 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_punpckldq ((__v2si)__m1, (__v2si)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_punpckldq (__m64 __m1, __m64 __m2) +{ + return _mm_unpacklo_pi32 (__m1, __m2); +} + +/* Add the 8-bit values in M1 to the 8-bit values in M2. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_add_pi8 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_paddb ((__v8qi)__m1, (__v8qi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_paddb (__m64 __m1, __m64 __m2) +{ + return _mm_add_pi8 (__m1, __m2); +} + +/* Add the 16-bit values in M1 to the 16-bit values in M2. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_add_pi16 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_paddw ((__v4hi)__m1, (__v4hi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_paddw (__m64 __m1, __m64 __m2) +{ + return _mm_add_pi16 (__m1, __m2); +} + +/* Add the 32-bit values in M1 to the 32-bit values in M2. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_add_pi32 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_paddd ((__v2si)__m1, (__v2si)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_paddd (__m64 __m1, __m64 __m2) +{ + return _mm_add_pi32 (__m1, __m2); +} + +/* Add the 64-bit values in M1 to the 64-bit values in M2. */ +#ifndef __SSE2__ +#pragma GCC push_options +#pragma GCC target("sse2") +#define __DISABLE_SSE2__ +#endif /* __SSE2__ */ + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_add_si64 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_paddq ((__v1di)__m1, (__v1di)__m2); +} +#ifdef __DISABLE_SSE2__ +#undef __DISABLE_SSE2__ +#pragma GCC pop_options +#endif /* __DISABLE_SSE2__ */ + +/* Add the 8-bit values in M1 to the 8-bit values in M2 using signed + saturated arithmetic. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_adds_pi8 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_paddsb ((__v8qi)__m1, (__v8qi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_paddsb (__m64 __m1, __m64 __m2) +{ + return _mm_adds_pi8 (__m1, __m2); +} + +/* Add the 16-bit values in M1 to the 16-bit values in M2 using signed + saturated arithmetic. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_adds_pi16 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_paddsw ((__v4hi)__m1, (__v4hi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_paddsw (__m64 __m1, __m64 __m2) +{ + return _mm_adds_pi16 (__m1, __m2); +} + +/* Add the 8-bit values in M1 to the 8-bit values in M2 using unsigned + saturated arithmetic. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_adds_pu8 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_paddusb ((__v8qi)__m1, (__v8qi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_paddusb (__m64 __m1, __m64 __m2) +{ + return _mm_adds_pu8 (__m1, __m2); +} + +/* Add the 16-bit values in M1 to the 16-bit values in M2 using unsigned + saturated arithmetic. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_adds_pu16 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_paddusw ((__v4hi)__m1, (__v4hi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_paddusw (__m64 __m1, __m64 __m2) +{ + return _mm_adds_pu16 (__m1, __m2); +} + +/* Subtract the 8-bit values in M2 from the 8-bit values in M1. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sub_pi8 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_psubb ((__v8qi)__m1, (__v8qi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psubb (__m64 __m1, __m64 __m2) +{ + return _mm_sub_pi8 (__m1, __m2); +} + +/* Subtract the 16-bit values in M2 from the 16-bit values in M1. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sub_pi16 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_psubw ((__v4hi)__m1, (__v4hi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psubw (__m64 __m1, __m64 __m2) +{ + return _mm_sub_pi16 (__m1, __m2); +} + +/* Subtract the 32-bit values in M2 from the 32-bit values in M1. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sub_pi32 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_psubd ((__v2si)__m1, (__v2si)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psubd (__m64 __m1, __m64 __m2) +{ + return _mm_sub_pi32 (__m1, __m2); +} + +/* Add the 64-bit values in M1 to the 64-bit values in M2. */ +#ifndef __SSE2__ +#pragma GCC push_options +#pragma GCC target("sse2") +#define __DISABLE_SSE2__ +#endif /* __SSE2__ */ + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sub_si64 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_psubq ((__v1di)__m1, (__v1di)__m2); +} +#ifdef __DISABLE_SSE2__ +#undef __DISABLE_SSE2__ +#pragma GCC pop_options +#endif /* __DISABLE_SSE2__ */ + +/* Subtract the 8-bit values in M2 from the 8-bit values in M1 using signed + saturating arithmetic. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_subs_pi8 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_psubsb ((__v8qi)__m1, (__v8qi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psubsb (__m64 __m1, __m64 __m2) +{ + return _mm_subs_pi8 (__m1, __m2); +} + +/* Subtract the 16-bit values in M2 from the 16-bit values in M1 using + signed saturating arithmetic. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_subs_pi16 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_psubsw ((__v4hi)__m1, (__v4hi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psubsw (__m64 __m1, __m64 __m2) +{ + return _mm_subs_pi16 (__m1, __m2); +} + +/* Subtract the 8-bit values in M2 from the 8-bit values in M1 using + unsigned saturating arithmetic. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_subs_pu8 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_psubusb ((__v8qi)__m1, (__v8qi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psubusb (__m64 __m1, __m64 __m2) +{ + return _mm_subs_pu8 (__m1, __m2); +} + +/* Subtract the 16-bit values in M2 from the 16-bit values in M1 using + unsigned saturating arithmetic. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_subs_pu16 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_psubusw ((__v4hi)__m1, (__v4hi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psubusw (__m64 __m1, __m64 __m2) +{ + return _mm_subs_pu16 (__m1, __m2); +} + +/* Multiply four 16-bit values in M1 by four 16-bit values in M2 producing + four 32-bit intermediate results, which are then summed by pairs to + produce two 32-bit results. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_madd_pi16 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_pmaddwd ((__v4hi)__m1, (__v4hi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pmaddwd (__m64 __m1, __m64 __m2) +{ + return _mm_madd_pi16 (__m1, __m2); +} + +/* Multiply four signed 16-bit values in M1 by four signed 16-bit values in + M2 and produce the high 16 bits of the 32-bit results. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mulhi_pi16 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_pmulhw ((__v4hi)__m1, (__v4hi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pmulhw (__m64 __m1, __m64 __m2) +{ + return _mm_mulhi_pi16 (__m1, __m2); +} + +/* Multiply four 16-bit values in M1 by four 16-bit values in M2 and produce + the low 16 bits of the results. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mullo_pi16 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_pmullw ((__v4hi)__m1, (__v4hi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pmullw (__m64 __m1, __m64 __m2) +{ + return _mm_mullo_pi16 (__m1, __m2); +} + +/* Shift four 16-bit values in M left by COUNT. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sll_pi16 (__m64 __m, __m64 __count) +{ + return (__m64) __builtin_ia32_psllw ((__v4hi)__m, (__v4hi)__count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psllw (__m64 __m, __m64 __count) +{ + return _mm_sll_pi16 (__m, __count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_slli_pi16 (__m64 __m, int __count) +{ + return (__m64) __builtin_ia32_psllwi ((__v4hi)__m, __count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psllwi (__m64 __m, int __count) +{ + return _mm_slli_pi16 (__m, __count); +} + +/* Shift two 32-bit values in M left by COUNT. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sll_pi32 (__m64 __m, __m64 __count) +{ + return (__m64) __builtin_ia32_pslld ((__v2si)__m, (__v2si)__count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pslld (__m64 __m, __m64 __count) +{ + return _mm_sll_pi32 (__m, __count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_slli_pi32 (__m64 __m, int __count) +{ + return (__m64) __builtin_ia32_pslldi ((__v2si)__m, __count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pslldi (__m64 __m, int __count) +{ + return _mm_slli_pi32 (__m, __count); +} + +/* Shift the 64-bit value in M left by COUNT. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sll_si64 (__m64 __m, __m64 __count) +{ + return (__m64) __builtin_ia32_psllq ((__v1di)__m, (__v1di)__count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psllq (__m64 __m, __m64 __count) +{ + return _mm_sll_si64 (__m, __count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_slli_si64 (__m64 __m, int __count) +{ + return (__m64) __builtin_ia32_psllqi ((__v1di)__m, __count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psllqi (__m64 __m, int __count) +{ + return _mm_slli_si64 (__m, __count); +} + +/* Shift four 16-bit values in M right by COUNT; shift in the sign bit. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sra_pi16 (__m64 __m, __m64 __count) +{ + return (__m64) __builtin_ia32_psraw ((__v4hi)__m, (__v4hi)__count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psraw (__m64 __m, __m64 __count) +{ + return _mm_sra_pi16 (__m, __count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srai_pi16 (__m64 __m, int __count) +{ + return (__m64) __builtin_ia32_psrawi ((__v4hi)__m, __count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psrawi (__m64 __m, int __count) +{ + return _mm_srai_pi16 (__m, __count); +} + +/* Shift two 32-bit values in M right by COUNT; shift in the sign bit. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sra_pi32 (__m64 __m, __m64 __count) +{ + return (__m64) __builtin_ia32_psrad ((__v2si)__m, (__v2si)__count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psrad (__m64 __m, __m64 __count) +{ + return _mm_sra_pi32 (__m, __count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srai_pi32 (__m64 __m, int __count) +{ + return (__m64) __builtin_ia32_psradi ((__v2si)__m, __count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psradi (__m64 __m, int __count) +{ + return _mm_srai_pi32 (__m, __count); +} + +/* Shift four 16-bit values in M right by COUNT; shift in zeros. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srl_pi16 (__m64 __m, __m64 __count) +{ + return (__m64) __builtin_ia32_psrlw ((__v4hi)__m, (__v4hi)__count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psrlw (__m64 __m, __m64 __count) +{ + return _mm_srl_pi16 (__m, __count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srli_pi16 (__m64 __m, int __count) +{ + return (__m64) __builtin_ia32_psrlwi ((__v4hi)__m, __count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psrlwi (__m64 __m, int __count) +{ + return _mm_srli_pi16 (__m, __count); +} + +/* Shift two 32-bit values in M right by COUNT; shift in zeros. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srl_pi32 (__m64 __m, __m64 __count) +{ + return (__m64) __builtin_ia32_psrld ((__v2si)__m, (__v2si)__count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psrld (__m64 __m, __m64 __count) +{ + return _mm_srl_pi32 (__m, __count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srli_pi32 (__m64 __m, int __count) +{ + return (__m64) __builtin_ia32_psrldi ((__v2si)__m, __count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psrldi (__m64 __m, int __count) +{ + return _mm_srli_pi32 (__m, __count); +} + +/* Shift the 64-bit value in M left by COUNT; shift in zeros. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srl_si64 (__m64 __m, __m64 __count) +{ + return (__m64) __builtin_ia32_psrlq ((__v1di)__m, (__v1di)__count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psrlq (__m64 __m, __m64 __count) +{ + return _mm_srl_si64 (__m, __count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_srli_si64 (__m64 __m, int __count) +{ + return (__m64) __builtin_ia32_psrlqi ((__v1di)__m, __count); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psrlqi (__m64 __m, int __count) +{ + return _mm_srli_si64 (__m, __count); +} + +/* Bit-wise AND the 64-bit values in M1 and M2. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_and_si64 (__m64 __m1, __m64 __m2) +{ + return __builtin_ia32_pand (__m1, __m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pand (__m64 __m1, __m64 __m2) +{ + return _mm_and_si64 (__m1, __m2); +} + +/* Bit-wise complement the 64-bit value in M1 and bit-wise AND it with the + 64-bit value in M2. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_andnot_si64 (__m64 __m1, __m64 __m2) +{ + return __builtin_ia32_pandn (__m1, __m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pandn (__m64 __m1, __m64 __m2) +{ + return _mm_andnot_si64 (__m1, __m2); +} + +/* Bit-wise inclusive OR the 64-bit values in M1 and M2. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_or_si64 (__m64 __m1, __m64 __m2) +{ + return __builtin_ia32_por (__m1, __m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_por (__m64 __m1, __m64 __m2) +{ + return _mm_or_si64 (__m1, __m2); +} + +/* Bit-wise exclusive OR the 64-bit values in M1 and M2. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_xor_si64 (__m64 __m1, __m64 __m2) +{ + return __builtin_ia32_pxor (__m1, __m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pxor (__m64 __m1, __m64 __m2) +{ + return _mm_xor_si64 (__m1, __m2); +} + +/* Compare eight 8-bit values. The result of the comparison is 0xFF if the + test is true and zero if false. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpeq_pi8 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_pcmpeqb ((__v8qi)__m1, (__v8qi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pcmpeqb (__m64 __m1, __m64 __m2) +{ + return _mm_cmpeq_pi8 (__m1, __m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpgt_pi8 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_pcmpgtb ((__v8qi)__m1, (__v8qi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pcmpgtb (__m64 __m1, __m64 __m2) +{ + return _mm_cmpgt_pi8 (__m1, __m2); +} + +/* Compare four 16-bit values. The result of the comparison is 0xFFFF if + the test is true and zero if false. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpeq_pi16 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_pcmpeqw ((__v4hi)__m1, (__v4hi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pcmpeqw (__m64 __m1, __m64 __m2) +{ + return _mm_cmpeq_pi16 (__m1, __m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpgt_pi16 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_pcmpgtw ((__v4hi)__m1, (__v4hi)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pcmpgtw (__m64 __m1, __m64 __m2) +{ + return _mm_cmpgt_pi16 (__m1, __m2); +} + +/* Compare two 32-bit values. The result of the comparison is 0xFFFFFFFF if + the test is true and zero if false. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpeq_pi32 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_pcmpeqd ((__v2si)__m1, (__v2si)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pcmpeqd (__m64 __m1, __m64 __m2) +{ + return _mm_cmpeq_pi32 (__m1, __m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpgt_pi32 (__m64 __m1, __m64 __m2) +{ + return (__m64) __builtin_ia32_pcmpgtd ((__v2si)__m1, (__v2si)__m2); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pcmpgtd (__m64 __m1, __m64 __m2) +{ + return _mm_cmpgt_pi32 (__m1, __m2); +} + +/* Creates a 64-bit zero. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_setzero_si64 (void) +{ + return (__m64)0LL; +} + +/* Creates a vector of two 32-bit values; I0 is least significant. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_pi32 (int __i1, int __i0) +{ + return (__m64) __builtin_ia32_vec_init_v2si (__i0, __i1); +} + +/* Creates a vector of four 16-bit values; W0 is least significant. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_pi16 (short __w3, short __w2, short __w1, short __w0) +{ + return (__m64) __builtin_ia32_vec_init_v4hi (__w0, __w1, __w2, __w3); +} + +/* Creates a vector of eight 8-bit values; B0 is least significant. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_pi8 (char __b7, char __b6, char __b5, char __b4, + char __b3, char __b2, char __b1, char __b0) +{ + return (__m64) __builtin_ia32_vec_init_v8qi (__b0, __b1, __b2, __b3, + __b4, __b5, __b6, __b7); +} + +/* Similar, but with the arguments in reverse order. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_setr_pi32 (int __i0, int __i1) +{ + return _mm_set_pi32 (__i1, __i0); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_setr_pi16 (short __w0, short __w1, short __w2, short __w3) +{ + return _mm_set_pi16 (__w3, __w2, __w1, __w0); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_setr_pi8 (char __b0, char __b1, char __b2, char __b3, + char __b4, char __b5, char __b6, char __b7) +{ + return _mm_set_pi8 (__b7, __b6, __b5, __b4, __b3, __b2, __b1, __b0); +} + +/* Creates a vector of two 32-bit values, both elements containing I. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set1_pi32 (int __i) +{ + return _mm_set_pi32 (__i, __i); +} + +/* Creates a vector of four 16-bit values, all elements containing W. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set1_pi16 (short __w) +{ + return _mm_set_pi16 (__w, __w, __w, __w); +} + +/* Creates a vector of eight 8-bit values, all elements containing B. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set1_pi8 (char __b) +{ + return _mm_set_pi8 (__b, __b, __b, __b, __b, __b, __b, __b); +} +#ifdef __DISABLE_MMX__ +#undef __DISABLE_MMX__ +#pragma GCC pop_options +#endif /* __DISABLE_MMX__ */ + +#endif /* _MMINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/nmmintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/nmmintrin.h new file mode 100644 index 0000000..9fc7107 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/nmmintrin.h @@ -0,0 +1,33 @@ +/* Copyright (C) 2007-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* Implemented from the specification included in the Intel C++ Compiler + User Guide and Reference, version 10.0. */ + +#ifndef _NMMINTRIN_H_INCLUDED +#define _NMMINTRIN_H_INCLUDED + +/* We just include SSE4.1 header file. */ +#include + +#endif /* _NMMINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/omp.h b/lib/gcc/x86_64-linux-android/4.9.x/include/omp.h new file mode 100644 index 0000000..b1824b5 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/omp.h @@ -0,0 +1,127 @@ +/* Copyright (C) 2005-2014 Free Software Foundation, Inc. + Contributed by Richard Henderson . + + This file is part of the GNU OpenMP Library (libgomp). + + Libgomp is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _OMP_H +#define _OMP_H 1 + +#ifndef _LIBGOMP_OMP_LOCK_DEFINED +#define _LIBGOMP_OMP_LOCK_DEFINED 1 +/* These two structures get edited by the libgomp build process to + reflect the shape of the two types. Their internals are private + to the library. */ + +typedef struct +{ + unsigned char _x[4] + __attribute__((__aligned__(4))); +} omp_lock_t; + +typedef struct +{ + unsigned char _x[12] + __attribute__((__aligned__(4))); +} omp_nest_lock_t; +#endif + +typedef enum omp_sched_t +{ + omp_sched_static = 1, + omp_sched_dynamic = 2, + omp_sched_guided = 3, + omp_sched_auto = 4 +} omp_sched_t; + +typedef enum omp_proc_bind_t +{ + omp_proc_bind_false = 0, + omp_proc_bind_true = 1, + omp_proc_bind_master = 2, + omp_proc_bind_close = 3, + omp_proc_bind_spread = 4 +} omp_proc_bind_t; + +#ifdef __cplusplus +extern "C" { +# define __GOMP_NOTHROW throw () +#else +# define __GOMP_NOTHROW __attribute__((__nothrow__)) +#endif + +extern void omp_set_num_threads (int) __GOMP_NOTHROW; +extern int omp_get_num_threads (void) __GOMP_NOTHROW; +extern int omp_get_max_threads (void) __GOMP_NOTHROW; +extern int omp_get_thread_num (void) __GOMP_NOTHROW; +extern int omp_get_num_procs (void) __GOMP_NOTHROW; + +extern int omp_in_parallel (void) __GOMP_NOTHROW; + +extern void omp_set_dynamic (int) __GOMP_NOTHROW; +extern int omp_get_dynamic (void) __GOMP_NOTHROW; + +extern void omp_set_nested (int) __GOMP_NOTHROW; +extern int omp_get_nested (void) __GOMP_NOTHROW; + +extern void omp_init_lock (omp_lock_t *) __GOMP_NOTHROW; +extern void omp_destroy_lock (omp_lock_t *) __GOMP_NOTHROW; +extern void omp_set_lock (omp_lock_t *) __GOMP_NOTHROW; +extern void omp_unset_lock (omp_lock_t *) __GOMP_NOTHROW; +extern int omp_test_lock (omp_lock_t *) __GOMP_NOTHROW; + +extern void omp_init_nest_lock (omp_nest_lock_t *) __GOMP_NOTHROW; +extern void omp_destroy_nest_lock (omp_nest_lock_t *) __GOMP_NOTHROW; +extern void omp_set_nest_lock (omp_nest_lock_t *) __GOMP_NOTHROW; +extern void omp_unset_nest_lock (omp_nest_lock_t *) __GOMP_NOTHROW; +extern int omp_test_nest_lock (omp_nest_lock_t *) __GOMP_NOTHROW; + +extern double omp_get_wtime (void) __GOMP_NOTHROW; +extern double omp_get_wtick (void) __GOMP_NOTHROW; + +extern void omp_set_schedule (omp_sched_t, int) __GOMP_NOTHROW; +extern void omp_get_schedule (omp_sched_t *, int *) __GOMP_NOTHROW; +extern int omp_get_thread_limit (void) __GOMP_NOTHROW; +extern void omp_set_max_active_levels (int) __GOMP_NOTHROW; +extern int omp_get_max_active_levels (void) __GOMP_NOTHROW; +extern int omp_get_level (void) __GOMP_NOTHROW; +extern int omp_get_ancestor_thread_num (int) __GOMP_NOTHROW; +extern int omp_get_team_size (int) __GOMP_NOTHROW; +extern int omp_get_active_level (void) __GOMP_NOTHROW; + +extern int omp_in_final (void) __GOMP_NOTHROW; + +extern int omp_get_cancellation (void) __GOMP_NOTHROW; +extern omp_proc_bind_t omp_get_proc_bind (void) __GOMP_NOTHROW; + +extern void omp_set_default_device (int) __GOMP_NOTHROW; +extern int omp_get_default_device (void) __GOMP_NOTHROW; +extern int omp_get_num_devices (void) __GOMP_NOTHROW; +extern int omp_get_num_teams (void) __GOMP_NOTHROW; +extern int omp_get_team_num (void) __GOMP_NOTHROW; + +extern int omp_is_initial_device (void) __GOMP_NOTHROW; + +#ifdef __cplusplus +} +#endif + +#endif /* _OMP_H */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/pmmintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/pmmintrin.h new file mode 100644 index 0000000..6a79500 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/pmmintrin.h @@ -0,0 +1,132 @@ +/* Copyright (C) 2003-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* Implemented from the specification included in the Intel C++ Compiler + User Guide and Reference, version 9.0. */ + +#ifndef _PMMINTRIN_H_INCLUDED +#define _PMMINTRIN_H_INCLUDED + +/* We need definitions from the SSE2 and SSE header files*/ +#include + +#ifndef __SSE3__ +#pragma GCC push_options +#pragma GCC target("sse3") +#define __DISABLE_SSE3__ +#endif /* __SSE3__ */ + +/* Additional bits in the MXCSR. */ +#define _MM_DENORMALS_ZERO_MASK 0x0040 +#define _MM_DENORMALS_ZERO_ON 0x0040 +#define _MM_DENORMALS_ZERO_OFF 0x0000 + +#define _MM_SET_DENORMALS_ZERO_MODE(mode) \ + _mm_setcsr ((_mm_getcsr () & ~_MM_DENORMALS_ZERO_MASK) | (mode)) +#define _MM_GET_DENORMALS_ZERO_MODE() \ + (_mm_getcsr() & _MM_DENORMALS_ZERO_MASK) + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_addsub_ps (__m128 __X, __m128 __Y) +{ + return (__m128) __builtin_ia32_addsubps ((__v4sf)__X, (__v4sf)__Y); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hadd_ps (__m128 __X, __m128 __Y) +{ + return (__m128) __builtin_ia32_haddps ((__v4sf)__X, (__v4sf)__Y); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hsub_ps (__m128 __X, __m128 __Y) +{ + return (__m128) __builtin_ia32_hsubps ((__v4sf)__X, (__v4sf)__Y); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_movehdup_ps (__m128 __X) +{ + return (__m128) __builtin_ia32_movshdup ((__v4sf)__X); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_moveldup_ps (__m128 __X) +{ + return (__m128) __builtin_ia32_movsldup ((__v4sf)__X); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_addsub_pd (__m128d __X, __m128d __Y) +{ + return (__m128d) __builtin_ia32_addsubpd ((__v2df)__X, (__v2df)__Y); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hadd_pd (__m128d __X, __m128d __Y) +{ + return (__m128d) __builtin_ia32_haddpd ((__v2df)__X, (__v2df)__Y); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hsub_pd (__m128d __X, __m128d __Y) +{ + return (__m128d) __builtin_ia32_hsubpd ((__v2df)__X, (__v2df)__Y); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_loaddup_pd (double const *__P) +{ + return _mm_load1_pd (__P); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_movedup_pd (__m128d __X) +{ + return _mm_shuffle_pd (__X, __X, _MM_SHUFFLE2 (0,0)); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_lddqu_si128 (__m128i const *__P) +{ + return (__m128i) __builtin_ia32_lddqu ((char const *)__P); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_monitor (void const * __P, unsigned int __E, unsigned int __H) +{ + __builtin_ia32_monitor (__P, __E, __H); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mwait (unsigned int __E, unsigned int __H) +{ + __builtin_ia32_mwait (__E, __H); +} + +#ifdef __DISABLE_SSE3__ +#undef __DISABLE_SSE3__ +#pragma GCC pop_options +#endif /* __DISABLE_SSE3__ */ + +#endif /* _PMMINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/popcntintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/popcntintrin.h new file mode 100644 index 0000000..41845d8 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/popcntintrin.h @@ -0,0 +1,53 @@ +/* Copyright (C) 2009-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _POPCNTINTRIN_H_INCLUDED +#define _POPCNTINTRIN_H_INCLUDED + +#ifndef __POPCNT__ +#pragma GCC push_options +#pragma GCC target("popcnt") +#define __DISABLE_POPCNT__ +#endif /* __POPCNT__ */ + +/* Calculate a number of bits set to 1. */ +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_popcnt_u32 (unsigned int __X) +{ + return __builtin_popcount (__X); +} + +#ifdef __x86_64__ +extern __inline long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_popcnt_u64 (unsigned long long __X) +{ + return __builtin_popcountll (__X); +} +#endif + +#ifdef __DISABLE_POPCNT__ +#undef __DISABLE_POPCNT__ +#pragma GCC pop_options +#endif /* __DISABLE_POPCNT__ */ + +#endif /* _POPCNTINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/prfchwintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/prfchwintrin.h new file mode 100644 index 0000000..b2f5772 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/prfchwintrin.h @@ -0,0 +1,37 @@ +/* Copyright (C) 2012-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#if !defined _X86INTRIN_H_INCLUDED && !defined _MM3DNOW_H_INCLUDED +# error "Never use directly; include or instead." +#endif + +#ifndef _PRFCHWINTRIN_H_INCLUDED +#define _PRFCHWINTRIN_H_INCLUDED + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_prefetchw (void *__P) +{ + __builtin_prefetch (__P, 1, 3 /* _MM_HINT_T0 */); +} + +#endif /* _PRFCHWINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/quadmath.h b/lib/gcc/x86_64-linux-android/4.9.x/include/quadmath.h new file mode 100644 index 0000000..aa9ef51 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/quadmath.h @@ -0,0 +1,200 @@ +/* GCC Quad-Precision Math Library + Copyright (C) 2010, 2011 Free Software Foundation, Inc. + Written by Francois-Xavier Coudert + +This file is part of the libquadmath library. +Libquadmath is free software; you can redistribute it and/or +modify it under the terms of the GNU Library General Public +License as published by the Free Software Foundation; either +version 2 of the License, or (at your option) any later version. + +Libquadmath is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +Library General Public License for more details. + +You should have received a copy of the GNU Library General Public +License along with libquadmath; see the file COPYING.LIB. If +not, write to the Free Software Foundation, Inc., 51 Franklin Street - Fifth Floor, +Boston, MA 02110-1301, USA. */ + +#ifndef QUADMATH_H +#define QUADMATH_H + +#include + +#ifdef __cplusplus +extern "C" { +#endif + +/* Define the complex type corresponding to __float128 + ("_Complex __float128" is not allowed) */ +typedef _Complex float __attribute__((mode(TC))) __complex128; + +#ifdef __cplusplus +# define __quadmath_throw throw () +# define __quadmath_nth(fct) fct throw () +#else +# define __quadmath_throw __attribute__((__nothrow__)) +# define __quadmath_nth(fct) __attribute__((__nothrow__)) fct +#endif + +/* Prototypes for real functions */ +extern __float128 acosq (__float128) __quadmath_throw; +extern __float128 acoshq (__float128) __quadmath_throw; +extern __float128 asinq (__float128) __quadmath_throw; +extern __float128 asinhq (__float128) __quadmath_throw; +extern __float128 atanq (__float128) __quadmath_throw; +extern __float128 atanhq (__float128) __quadmath_throw; +extern __float128 atan2q (__float128, __float128) __quadmath_throw; +extern __float128 cbrtq (__float128) __quadmath_throw; +extern __float128 ceilq (__float128) __quadmath_throw; +extern __float128 copysignq (__float128, __float128) __quadmath_throw; +extern __float128 coshq (__float128) __quadmath_throw; +extern __float128 cosq (__float128) __quadmath_throw; +extern __float128 erfq (__float128) __quadmath_throw; +extern __float128 erfcq (__float128) __quadmath_throw; +extern __float128 expq (__float128) __quadmath_throw; +extern __float128 expm1q (__float128) __quadmath_throw; +extern __float128 fabsq (__float128) __quadmath_throw; +extern __float128 fdimq (__float128, __float128) __quadmath_throw; +extern int finiteq (__float128) __quadmath_throw; +extern __float128 floorq (__float128) __quadmath_throw; +extern __float128 fmaq (__float128, __float128, __float128) __quadmath_throw; +extern __float128 fmaxq (__float128, __float128) __quadmath_throw; +extern __float128 fminq (__float128, __float128) __quadmath_throw; +extern __float128 fmodq (__float128, __float128) __quadmath_throw; +extern __float128 frexpq (__float128, int *) __quadmath_throw; +extern __float128 hypotq (__float128, __float128) __quadmath_throw; +extern int isinfq (__float128) __quadmath_throw; +extern int ilogbq (__float128) __quadmath_throw; +extern int isnanq (__float128) __quadmath_throw; +extern __float128 j0q (__float128) __quadmath_throw; +extern __float128 j1q (__float128) __quadmath_throw; +extern __float128 jnq (int, __float128) __quadmath_throw; +extern __float128 ldexpq (__float128, int) __quadmath_throw; +extern __float128 lgammaq (__float128) __quadmath_throw; +extern long long int llrintq (__float128) __quadmath_throw; +extern long long int llroundq (__float128) __quadmath_throw; +extern __float128 logq (__float128) __quadmath_throw; +extern __float128 log10q (__float128) __quadmath_throw; +extern __float128 log2q (__float128) __quadmath_throw; +extern __float128 log1pq (__float128) __quadmath_throw; +extern long int lrintq (__float128) __quadmath_throw; +extern long int lroundq (__float128) __quadmath_throw; +extern __float128 modfq (__float128, __float128 *) __quadmath_throw; +extern __float128 nanq (const char *) __quadmath_throw; +extern __float128 nearbyintq (__float128) __quadmath_throw; +extern __float128 nextafterq (__float128, __float128) __quadmath_throw; +extern __float128 powq (__float128, __float128) __quadmath_throw; +extern __float128 remainderq (__float128, __float128) __quadmath_throw; +extern __float128 remquoq (__float128, __float128, int *) __quadmath_throw; +extern __float128 rintq (__float128) __quadmath_throw; +extern __float128 roundq (__float128) __quadmath_throw; +extern __float128 scalblnq (__float128, long int) __quadmath_throw; +extern __float128 scalbnq (__float128, int) __quadmath_throw; +extern int signbitq (__float128) __quadmath_throw; +extern void sincosq (__float128, __float128 *, __float128 *) __quadmath_throw; +extern __float128 sinhq (__float128) __quadmath_throw; +extern __float128 sinq (__float128) __quadmath_throw; +extern __float128 sqrtq (__float128) __quadmath_throw; +extern __float128 tanq (__float128) __quadmath_throw; +extern __float128 tanhq (__float128) __quadmath_throw; +extern __float128 tgammaq (__float128) __quadmath_throw; +extern __float128 truncq (__float128) __quadmath_throw; +extern __float128 y0q (__float128) __quadmath_throw; +extern __float128 y1q (__float128) __quadmath_throw; +extern __float128 ynq (int, __float128) __quadmath_throw; + + +/* Prototypes for complex functions */ +extern __float128 cabsq (__complex128) __quadmath_throw; +extern __float128 cargq (__complex128) __quadmath_throw; +extern __float128 cimagq (__complex128) __quadmath_throw; +extern __float128 crealq (__complex128) __quadmath_throw; +extern __complex128 cacosq (__complex128) __quadmath_throw; +extern __complex128 cacoshq (__complex128) __quadmath_throw; +extern __complex128 casinq (__complex128) __quadmath_throw; +extern __complex128 casinhq (__complex128) __quadmath_throw; +extern __complex128 catanq (__complex128) __quadmath_throw; +extern __complex128 catanhq (__complex128) __quadmath_throw; +extern __complex128 ccosq (__complex128) __quadmath_throw; +extern __complex128 ccoshq (__complex128) __quadmath_throw; +extern __complex128 cexpq (__complex128) __quadmath_throw; +extern __complex128 cexpiq (__float128) __quadmath_throw; +extern __complex128 clogq (__complex128) __quadmath_throw; +extern __complex128 clog10q (__complex128) __quadmath_throw; +extern __complex128 conjq (__complex128) __quadmath_throw; +extern __complex128 cpowq (__complex128, __complex128) __quadmath_throw; +extern __complex128 cprojq (__complex128) __quadmath_throw; +extern __complex128 csinq (__complex128) __quadmath_throw; +extern __complex128 csinhq (__complex128) __quadmath_throw; +extern __complex128 csqrtq (__complex128) __quadmath_throw; +extern __complex128 ctanq (__complex128) __quadmath_throw; +extern __complex128 ctanhq (__complex128) __quadmath_throw; + + +/* Prototypes for string <-> __float128 conversion functions */ +extern __float128 strtoflt128 (const char *, char **) __quadmath_throw; +extern int quadmath_snprintf (char *str, size_t size, + const char *format, ...) __quadmath_throw; + + +/* Macros */ +#define FLT128_MAX 1.18973149535723176508575932662800702e4932Q +#define FLT128_MIN 3.36210314311209350626267781732175260e-4932Q +#define FLT128_EPSILON 1.92592994438723585305597794258492732e-34Q +#define FLT128_DENORM_MIN 6.475175119438025110924438958227646552e-4966Q +#define FLT128_MANT_DIG 113 +#define FLT128_MIN_EXP (-16381) +#define FLT128_MAX_EXP 16384 +#define FLT128_DIG 33 +#define FLT128_MIN_10_EXP (-4931) +#define FLT128_MAX_10_EXP 4932 + + +#define HUGE_VALQ __builtin_huge_valq() +/* The following alternative is valid, but brings the warning: + (floating constant exceeds range of ‘__float128’) */ +/* #define HUGE_VALQ (__extension__ 0x1.0p32767Q) */ + +#define M_Eq 2.7182818284590452353602874713526625Q /* e */ +#define M_LOG2Eq 1.4426950408889634073599246810018921Q /* log_2 e */ +#define M_LOG10Eq 0.4342944819032518276511289189166051Q /* log_10 e */ +#define M_LN2q 0.6931471805599453094172321214581766Q /* log_e 2 */ +#define M_LN10q 2.3025850929940456840179914546843642Q /* log_e 10 */ +#define M_PIq 3.1415926535897932384626433832795029Q /* pi */ +#define M_PI_2q 1.5707963267948966192313216916397514Q /* pi/2 */ +#define M_PI_4q 0.7853981633974483096156608458198757Q /* pi/4 */ +#define M_1_PIq 0.3183098861837906715377675267450287Q /* 1/pi */ +#define M_2_PIq 0.6366197723675813430755350534900574Q /* 2/pi */ +#define M_2_SQRTPIq 1.1283791670955125738961589031215452Q /* 2/sqrt(pi) */ +#define M_SQRT2q 1.4142135623730950488016887242096981Q /* sqrt(2) */ +#define M_SQRT1_2q 0.7071067811865475244008443621048490Q /* 1/sqrt(2) */ + +#define __quadmath_extern_inline \ + extern inline __attribute__ ((__gnu_inline__)) + +__quadmath_extern_inline __float128 +__quadmath_nth (cimagq (__complex128 __z)) +{ + return __imag__ __z; +} + +__quadmath_extern_inline __float128 +__quadmath_nth (crealq (__complex128 __z)) +{ + return __real__ __z; +} + +__quadmath_extern_inline __complex128 +__quadmath_nth (conjq (__complex128 __z)) +{ + return __extension__ ~__z; +} + +#ifdef __cplusplus +} +#endif + +#endif diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/quadmath_weak.h b/lib/gcc/x86_64-linux-android/4.9.x/include/quadmath_weak.h new file mode 100644 index 0000000..986079a --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/quadmath_weak.h @@ -0,0 +1,137 @@ +/* GCC Quad-Precision Math Library + Copyright (C) 2010, 2011 Free Software Foundation, Inc. + Written by Tobias Burnus + +This file is part of the libquadmath library. +Libquadmath is free software; you can redistribute it and/or +modify it under the terms of the GNU Library General Public +License as published by the Free Software Foundation; either +version 2 of the License, or (at your option) any later version. + +Libquadmath is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +Library General Public License for more details. + +You should have received a copy of the GNU Library General Public +License along with libquadmath; see the file COPYING.LIB. If +not, write to the Free Software Foundation, Inc., 51 Franklin Street - Fifth Floor, +Boston, MA 02110-1301, USA. */ + +#ifndef QUADMATH_WEAK_H +#define QUADMATH_WEAK_H + +#include "quadmath.h" + +#if SUPPORTS_WEAK +# define __qmath2(name,name2,type) \ + static __typeof(type) name __attribute__ ((__weakref__(#name2))); +# define __qmath_(name) __qmath_ ## name +#else +# define __qmath2(name,name2,type) +# define __qmath_(name) name +#endif + +/* __qmath_foo is a weak reference to symbol foo. */ +#define __qmath3(name) __qmath2(__qmath_ ## name,name,name) + +/* Prototypes for real functions. */ +__qmath3 (acosq) +__qmath3 (acoshq) +__qmath3 (asinq) +__qmath3 (asinhq) +__qmath3 (atanq) +__qmath3 (atanhq) +__qmath3 (atan2q) +__qmath3 (cbrtq) +__qmath3 (ceilq) +__qmath3 (copysignq) +__qmath3 (coshq) +__qmath3 (cosq) +__qmath3 (erfq) +__qmath3 (erfcq) +__qmath3 (expq) +__qmath3 (expm1q) +__qmath3 (fabsq) +__qmath3 (fdimq) +__qmath3 (finiteq) +__qmath3 (floorq) +__qmath3 (fmaq) +__qmath3 (fmaxq) +__qmath3 (fminq) +__qmath3 (fmodq) +__qmath3 (frexpq) +__qmath3 (hypotq) +__qmath3 (ilogbq) +__qmath3 (isinfq) +__qmath3 (isnanq) +__qmath3 (j0q) +__qmath3 (j1q) +__qmath3 (jnq) +__qmath3 (ldexpq) +__qmath3 (lgammaq) +__qmath3 (llrintq) +__qmath3 (llroundq) +__qmath3 (logq) +__qmath3 (log10q) +__qmath3 (log1pq) +__qmath3 (log2q) +__qmath3 (lrintq) +__qmath3 (lroundq) +__qmath3 (modfq) +__qmath3 (nanq) +__qmath3 (nearbyintq) +__qmath3 (nextafterq) +__qmath3 (powq) +__qmath3 (remainderq) +__qmath3 (remquoq) +__qmath3 (rintq) +__qmath3 (roundq) +__qmath3 (scalblnq) +__qmath3 (scalbnq) +__qmath3 (signbitq) +__qmath3 (sincosq) +__qmath3 (sinhq) +__qmath3 (sinq) +__qmath3 (sqrtq) +__qmath3 (tanq) +__qmath3 (tanhq) +__qmath3 (tgammaq) +__qmath3 (truncq) +__qmath3 (y0q) +__qmath3 (y1q) +__qmath3 (ynq) + + +/* Prototypes for complex functions. */ +__qmath3 (cabsq) +__qmath3 (cargq) +__qmath3 (cimagq) +__qmath3 (crealq) +__qmath3 (cacosq) +__qmath3 (cacoshq) +__qmath3 (casinq) +__qmath3 (casinhq) +__qmath3 (catanq) +__qmath3 (catanhq) +__qmath3 (ccosq) +__qmath3 (ccoshq) +__qmath3 (cexpq) +__qmath3 (cexpiq) +__qmath3 (clogq) +__qmath3 (clog10q) +__qmath3 (conjq) +__qmath3 (cpowq) +__qmath3 (cprojq) +__qmath3 (csinq) +__qmath3 (csinhq) +__qmath3 (csqrtq) +__qmath3 (ctanq) +__qmath3 (ctanhq) + + +/* Prototypes for string <-> flt128 conversion functions. */ +__qmath3 (strtoflt128) +__qmath3 (quadmath_snprintf) + +#endif diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/rdseedintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/rdseedintrin.h new file mode 100644 index 0000000..0ab18e5 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/rdseedintrin.h @@ -0,0 +1,66 @@ +/* Copyright (C) 2012-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#if !defined _X86INTRIN_H_INCLUDED +# error "Never use directly; include instead." +#endif + +#ifndef _RDSEEDINTRIN_H_INCLUDED +#define _RDSEEDINTRIN_H_INCLUDED + +#ifndef __RDSEED__ +#pragma GCC push_options +#pragma GCC target("rdseed") +#define __DISABLE_RDSEED__ +#endif /* __RDSEED__ */ + + +extern __inline int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_rdseed16_step (unsigned short *p) +{ + return __builtin_ia32_rdseed_hi_step (p); +} + +extern __inline int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_rdseed32_step (unsigned int *p) +{ + return __builtin_ia32_rdseed_si_step (p); +} + +#ifdef __x86_64__ +extern __inline int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_rdseed64_step (unsigned long long *p) +{ + return __builtin_ia32_rdseed_di_step (p); +} +#endif + +#ifdef __DISABLE_RDSEED__ +#undef __DISABLE_RDSEED__ +#pragma GCC pop_options +#endif /* __DISABLE_RDSEED__ */ + +#endif /* _RDSEEDINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/rtmintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/rtmintrin.h new file mode 100644 index 0000000..ac40d22 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/rtmintrin.h @@ -0,0 +1,84 @@ +/* Copyright (C) 2012-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _IMMINTRIN_H_INCLUDED +# error "Never use directly; include instead." +#endif + +#ifndef _RTMINTRIN_H_INCLUDED +#define _RTMINTRIN_H_INCLUDED + +#ifndef __RTM__ +#pragma GCC push_options +#pragma GCC target("rtm") +#define __DISABLE_RTM__ +#endif /* __RTM__ */ + +#define _XBEGIN_STARTED (~0u) +#define _XABORT_EXPLICIT (1 << 0) +#define _XABORT_RETRY (1 << 1) +#define _XABORT_CONFLICT (1 << 2) +#define _XABORT_CAPACITY (1 << 3) +#define _XABORT_DEBUG (1 << 4) +#define _XABORT_NESTED (1 << 5) +#define _XABORT_CODE(x) (((x) >> 24) & 0xFF) + +/* Start an RTM code region. Return _XBEGIN_STARTED on success and the + abort condition otherwise. */ +extern __inline unsigned int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_xbegin (void) +{ + return __builtin_ia32_xbegin (); +} + +/* Specify the end of an RTM code region. If it corresponds to the + outermost transaction, then attempts the transaction commit. If the + commit fails, then control is transferred to the outermost transaction + fallback handler. */ +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_xend (void) +{ + __builtin_ia32_xend (); +} + +/* Force an RTM abort condition. The control is transferred to the + outermost transaction fallback handler with the abort condition IMM. */ +#ifdef __OPTIMIZE__ +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_xabort (const unsigned int imm) +{ + __builtin_ia32_xabort (imm); +} +#else +#define _xabort(N) __builtin_ia32_xabort (N) +#endif /* __OPTIMIZE__ */ + +#ifdef __DISABLE_RTM__ +#undef __DISABLE_RTM__ +#pragma GCC pop_options +#endif /* __DISABLE_RTM__ */ + +#endif /* _RTMINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/shaintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/shaintrin.h new file mode 100644 index 0000000..d8a3da3 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/shaintrin.h @@ -0,0 +1,98 @@ +/* Copyright (C) 2013-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _SHAINTRIN_H_INCLUDED +#define _SHAINTRIN_H_INCLUDED + +#ifndef __SHA__ +#pragma GCC push_options +#pragma GCC target("sha") +#define __DISABLE_SHA__ +#endif /* __SHA__ */ + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sha1msg1_epu32 (__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_sha1msg1 ((__v4si) __A, (__v4si) __B); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sha1msg2_epu32 (__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_sha1msg2 ((__v4si) __A, (__v4si) __B); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sha1nexte_epu32 (__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_sha1nexte ((__v4si) __A, (__v4si) __B); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sha1rnds4_epu32 (__m128i __A, __m128i __B, const int __I) +{ + return (__m128i) __builtin_ia32_sha1rnds4 ((__v4si) __A, (__v4si) __B, __I); +} +#else +#define _mm_sha1rnds4_epu32(A, B, I) \ + ((__m128i) __builtin_ia32_sha1rnds4 ((__v4si)(__m128i)A, \ + (__v4si)(__m128i)B, (int)I)) +#endif + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sha256msg1_epu32 (__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_sha256msg1 ((__v4si) __A, (__v4si) __B); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sha256msg2_epu32 (__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_sha256msg2 ((__v4si) __A, (__v4si) __B); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sha256rnds2_epu32 (__m128i __A, __m128i __B, __m128i __C) +{ + return (__m128i) __builtin_ia32_sha256rnds2 ((__v4si) __A, (__v4si) __B, + (__v4si) __C); +} + +#ifdef __DISABLE_SHA__ +#undef __DISABLE_SHA__ +#pragma GCC pop_options +#endif /* __DISABLE_SHA__ */ + +#endif /* _SHAINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/smmintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/smmintrin.h new file mode 100644 index 0000000..886ace4 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/smmintrin.h @@ -0,0 +1,862 @@ +/* Copyright (C) 2007-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* Implemented from the specification included in the Intel C++ Compiler + User Guide and Reference, version 10.0. */ + +#ifndef _SMMINTRIN_H_INCLUDED +#define _SMMINTRIN_H_INCLUDED + +/* We need definitions from the SSSE3, SSE3, SSE2 and SSE header + files. */ +#include + +#ifndef __SSE4_1__ +#pragma GCC push_options +#pragma GCC target("sse4.1") +#define __DISABLE_SSE4_1__ +#endif /* __SSE4_1__ */ + +/* Rounding mode macros. */ +#define _MM_FROUND_TO_NEAREST_INT 0x00 +#define _MM_FROUND_TO_NEG_INF 0x01 +#define _MM_FROUND_TO_POS_INF 0x02 +#define _MM_FROUND_TO_ZERO 0x03 +#define _MM_FROUND_CUR_DIRECTION 0x04 + +#define _MM_FROUND_RAISE_EXC 0x00 +#define _MM_FROUND_NO_EXC 0x08 + +#define _MM_FROUND_NINT \ + (_MM_FROUND_TO_NEAREST_INT | _MM_FROUND_RAISE_EXC) +#define _MM_FROUND_FLOOR \ + (_MM_FROUND_TO_NEG_INF | _MM_FROUND_RAISE_EXC) +#define _MM_FROUND_CEIL \ + (_MM_FROUND_TO_POS_INF | _MM_FROUND_RAISE_EXC) +#define _MM_FROUND_TRUNC \ + (_MM_FROUND_TO_ZERO | _MM_FROUND_RAISE_EXC) +#define _MM_FROUND_RINT \ + (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_RAISE_EXC) +#define _MM_FROUND_NEARBYINT \ + (_MM_FROUND_CUR_DIRECTION | _MM_FROUND_NO_EXC) + +/* Test Instruction */ +/* Packed integer 128-bit bitwise comparison. Return 1 if + (__V & __M) == 0. */ +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_testz_si128 (__m128i __M, __m128i __V) +{ + return __builtin_ia32_ptestz128 ((__v2di)__M, (__v2di)__V); +} + +/* Packed integer 128-bit bitwise comparison. Return 1 if + (__V & ~__M) == 0. */ +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_testc_si128 (__m128i __M, __m128i __V) +{ + return __builtin_ia32_ptestc128 ((__v2di)__M, (__v2di)__V); +} + +/* Packed integer 128-bit bitwise comparison. Return 1 if + (__V & __M) != 0 && (__V & ~__M) != 0. */ +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_testnzc_si128 (__m128i __M, __m128i __V) +{ + return __builtin_ia32_ptestnzc128 ((__v2di)__M, (__v2di)__V); +} + +/* Macros for packed integer 128-bit comparison intrinsics. */ +#define _mm_test_all_zeros(M, V) _mm_testz_si128 ((M), (V)) + +#define _mm_test_all_ones(V) \ + _mm_testc_si128 ((V), _mm_cmpeq_epi32 ((V), (V))) + +#define _mm_test_mix_ones_zeros(M, V) _mm_testnzc_si128 ((M), (V)) + +/* Packed/scalar double precision floating point rounding. */ + +#ifdef __OPTIMIZE__ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_round_pd (__m128d __V, const int __M) +{ + return (__m128d) __builtin_ia32_roundpd ((__v2df)__V, __M); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_round_sd(__m128d __D, __m128d __V, const int __M) +{ + return (__m128d) __builtin_ia32_roundsd ((__v2df)__D, + (__v2df)__V, + __M); +} +#else +#define _mm_round_pd(V, M) \ + ((__m128d) __builtin_ia32_roundpd ((__v2df)(__m128d)(V), (int)(M))) + +#define _mm_round_sd(D, V, M) \ + ((__m128d) __builtin_ia32_roundsd ((__v2df)(__m128d)(D), \ + (__v2df)(__m128d)(V), (int)(M))) +#endif + +/* Packed/scalar single precision floating point rounding. */ + +#ifdef __OPTIMIZE__ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_round_ps (__m128 __V, const int __M) +{ + return (__m128) __builtin_ia32_roundps ((__v4sf)__V, __M); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_round_ss (__m128 __D, __m128 __V, const int __M) +{ + return (__m128) __builtin_ia32_roundss ((__v4sf)__D, + (__v4sf)__V, + __M); +} +#else +#define _mm_round_ps(V, M) \ + ((__m128) __builtin_ia32_roundps ((__v4sf)(__m128)(V), (int)(M))) + +#define _mm_round_ss(D, V, M) \ + ((__m128) __builtin_ia32_roundss ((__v4sf)(__m128)(D), \ + (__v4sf)(__m128)(V), (int)(M))) +#endif + +/* Macros for ceil/floor intrinsics. */ +#define _mm_ceil_pd(V) _mm_round_pd ((V), _MM_FROUND_CEIL) +#define _mm_ceil_sd(D, V) _mm_round_sd ((D), (V), _MM_FROUND_CEIL) + +#define _mm_floor_pd(V) _mm_round_pd((V), _MM_FROUND_FLOOR) +#define _mm_floor_sd(D, V) _mm_round_sd ((D), (V), _MM_FROUND_FLOOR) + +#define _mm_ceil_ps(V) _mm_round_ps ((V), _MM_FROUND_CEIL) +#define _mm_ceil_ss(D, V) _mm_round_ss ((D), (V), _MM_FROUND_CEIL) + +#define _mm_floor_ps(V) _mm_round_ps ((V), _MM_FROUND_FLOOR) +#define _mm_floor_ss(D, V) _mm_round_ss ((D), (V), _MM_FROUND_FLOOR) + +/* SSE4.1 */ + +/* Integer blend instructions - select data from 2 sources using + constant/variable mask. */ + +#ifdef __OPTIMIZE__ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_blend_epi16 (__m128i __X, __m128i __Y, const int __M) +{ + return (__m128i) __builtin_ia32_pblendw128 ((__v8hi)__X, + (__v8hi)__Y, + __M); +} +#else +#define _mm_blend_epi16(X, Y, M) \ + ((__m128i) __builtin_ia32_pblendw128 ((__v8hi)(__m128i)(X), \ + (__v8hi)(__m128i)(Y), (int)(M))) +#endif + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_blendv_epi8 (__m128i __X, __m128i __Y, __m128i __M) +{ + return (__m128i) __builtin_ia32_pblendvb128 ((__v16qi)__X, + (__v16qi)__Y, + (__v16qi)__M); +} + +/* Single precision floating point blend instructions - select data + from 2 sources using constant/variable mask. */ + +#ifdef __OPTIMIZE__ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_blend_ps (__m128 __X, __m128 __Y, const int __M) +{ + return (__m128) __builtin_ia32_blendps ((__v4sf)__X, + (__v4sf)__Y, + __M); +} +#else +#define _mm_blend_ps(X, Y, M) \ + ((__m128) __builtin_ia32_blendps ((__v4sf)(__m128)(X), \ + (__v4sf)(__m128)(Y), (int)(M))) +#endif + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_blendv_ps (__m128 __X, __m128 __Y, __m128 __M) +{ + return (__m128) __builtin_ia32_blendvps ((__v4sf)__X, + (__v4sf)__Y, + (__v4sf)__M); +} + +/* Double precision floating point blend instructions - select data + from 2 sources using constant/variable mask. */ + +#ifdef __OPTIMIZE__ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_blend_pd (__m128d __X, __m128d __Y, const int __M) +{ + return (__m128d) __builtin_ia32_blendpd ((__v2df)__X, + (__v2df)__Y, + __M); +} +#else +#define _mm_blend_pd(X, Y, M) \ + ((__m128d) __builtin_ia32_blendpd ((__v2df)(__m128d)(X), \ + (__v2df)(__m128d)(Y), (int)(M))) +#endif + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_blendv_pd (__m128d __X, __m128d __Y, __m128d __M) +{ + return (__m128d) __builtin_ia32_blendvpd ((__v2df)__X, + (__v2df)__Y, + (__v2df)__M); +} + +/* Dot product instructions with mask-defined summing and zeroing parts + of result. */ + +#ifdef __OPTIMIZE__ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_dp_ps (__m128 __X, __m128 __Y, const int __M) +{ + return (__m128) __builtin_ia32_dpps ((__v4sf)__X, + (__v4sf)__Y, + __M); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_dp_pd (__m128d __X, __m128d __Y, const int __M) +{ + return (__m128d) __builtin_ia32_dppd ((__v2df)__X, + (__v2df)__Y, + __M); +} +#else +#define _mm_dp_ps(X, Y, M) \ + ((__m128) __builtin_ia32_dpps ((__v4sf)(__m128)(X), \ + (__v4sf)(__m128)(Y), (int)(M))) + +#define _mm_dp_pd(X, Y, M) \ + ((__m128d) __builtin_ia32_dppd ((__v2df)(__m128d)(X), \ + (__v2df)(__m128d)(Y), (int)(M))) +#endif + +/* Packed integer 64-bit comparison, zeroing or filling with ones + corresponding parts of result. */ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpeq_epi64 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_pcmpeqq ((__v2di)__X, (__v2di)__Y); +} + +/* Min/max packed integer instructions. */ + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_epi8 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_pminsb128 ((__v16qi)__X, (__v16qi)__Y); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_epi8 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_pmaxsb128 ((__v16qi)__X, (__v16qi)__Y); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_epu16 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_pminuw128 ((__v8hi)__X, (__v8hi)__Y); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_epu16 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_pmaxuw128 ((__v8hi)__X, (__v8hi)__Y); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_epi32 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_pminsd128 ((__v4si)__X, (__v4si)__Y); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_epi32 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_pmaxsd128 ((__v4si)__X, (__v4si)__Y); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_epu32 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_pminud128 ((__v4si)__X, (__v4si)__Y); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_epu32 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_pmaxud128 ((__v4si)__X, (__v4si)__Y); +} + +/* Packed integer 32-bit multiplication with truncation of upper + halves of results. */ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mullo_epi32 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_pmulld128 ((__v4si)__X, (__v4si)__Y); +} + +/* Packed integer 32-bit multiplication of 2 pairs of operands + with two 64-bit results. */ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mul_epi32 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_pmuldq128 ((__v4si)__X, (__v4si)__Y); +} + +/* Insert single precision float into packed single precision array + element selected by index N. The bits [7-6] of N define S + index, the bits [5-4] define D index, and bits [3-0] define + zeroing mask for D. */ + +#ifdef __OPTIMIZE__ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_insert_ps (__m128 __D, __m128 __S, const int __N) +{ + return (__m128) __builtin_ia32_insertps128 ((__v4sf)__D, + (__v4sf)__S, + __N); +} +#else +#define _mm_insert_ps(D, S, N) \ + ((__m128) __builtin_ia32_insertps128 ((__v4sf)(__m128)(D), \ + (__v4sf)(__m128)(S), (int)(N))) +#endif + +/* Helper macro to create the N value for _mm_insert_ps. */ +#define _MM_MK_INSERTPS_NDX(S, D, M) (((S) << 6) | ((D) << 4) | (M)) + +/* Extract binary representation of single precision float from packed + single precision array element of X selected by index N. */ + +#ifdef __OPTIMIZE__ +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_extract_ps (__m128 __X, const int __N) +{ + union { int i; float f; } __tmp; + __tmp.f = __builtin_ia32_vec_ext_v4sf ((__v4sf)__X, __N); + return __tmp.i; +} +#else +#define _mm_extract_ps(X, N) \ + (__extension__ \ + ({ \ + union { int i; float f; } __tmp; \ + __tmp.f = __builtin_ia32_vec_ext_v4sf ((__v4sf)(__m128)(X), (int)(N)); \ + __tmp.i; \ + })) +#endif + +/* Extract binary representation of single precision float into + D from packed single precision array element of S selected + by index N. */ +#define _MM_EXTRACT_FLOAT(D, S, N) \ + { (D) = __builtin_ia32_vec_ext_v4sf ((__v4sf)(S), (N)); } + +/* Extract specified single precision float element into the lower + part of __m128. */ +#define _MM_PICK_OUT_PS(X, N) \ + _mm_insert_ps (_mm_setzero_ps (), (X), \ + _MM_MK_INSERTPS_NDX ((N), 0, 0x0e)) + +/* Insert integer, S, into packed integer array element of D + selected by index N. */ + +#ifdef __OPTIMIZE__ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_insert_epi8 (__m128i __D, int __S, const int __N) +{ + return (__m128i) __builtin_ia32_vec_set_v16qi ((__v16qi)__D, + __S, __N); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_insert_epi32 (__m128i __D, int __S, const int __N) +{ + return (__m128i) __builtin_ia32_vec_set_v4si ((__v4si)__D, + __S, __N); +} + +#ifdef __x86_64__ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_insert_epi64 (__m128i __D, long long __S, const int __N) +{ + return (__m128i) __builtin_ia32_vec_set_v2di ((__v2di)__D, + __S, __N); +} +#endif +#else +#define _mm_insert_epi8(D, S, N) \ + ((__m128i) __builtin_ia32_vec_set_v16qi ((__v16qi)(__m128i)(D), \ + (int)(S), (int)(N))) + +#define _mm_insert_epi32(D, S, N) \ + ((__m128i) __builtin_ia32_vec_set_v4si ((__v4si)(__m128i)(D), \ + (int)(S), (int)(N))) + +#ifdef __x86_64__ +#define _mm_insert_epi64(D, S, N) \ + ((__m128i) __builtin_ia32_vec_set_v2di ((__v2di)(__m128i)(D), \ + (long long)(S), (int)(N))) +#endif +#endif + +/* Extract integer from packed integer array element of X selected by + index N. */ + +#ifdef __OPTIMIZE__ +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_extract_epi8 (__m128i __X, const int __N) +{ + return (unsigned char) __builtin_ia32_vec_ext_v16qi ((__v16qi)__X, __N); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_extract_epi32 (__m128i __X, const int __N) +{ + return __builtin_ia32_vec_ext_v4si ((__v4si)__X, __N); +} + +#ifdef __x86_64__ +extern __inline long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_extract_epi64 (__m128i __X, const int __N) +{ + return __builtin_ia32_vec_ext_v2di ((__v2di)__X, __N); +} +#endif +#else +#define _mm_extract_epi8(X, N) \ + ((int) (unsigned char) __builtin_ia32_vec_ext_v16qi ((__v16qi)(__m128i)(X), (int)(N))) +#define _mm_extract_epi32(X, N) \ + ((int) __builtin_ia32_vec_ext_v4si ((__v4si)(__m128i)(X), (int)(N))) + +#ifdef __x86_64__ +#define _mm_extract_epi64(X, N) \ + ((long long) __builtin_ia32_vec_ext_v2di ((__v2di)(__m128i)(X), (int)(N))) +#endif +#endif + +/* Return horizontal packed word minimum and its index in bits [15:0] + and bits [18:16] respectively. */ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_minpos_epu16 (__m128i __X) +{ + return (__m128i) __builtin_ia32_phminposuw128 ((__v8hi)__X); +} + +/* Packed integer sign-extension. */ + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepi8_epi32 (__m128i __X) +{ + return (__m128i) __builtin_ia32_pmovsxbd128 ((__v16qi)__X); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepi16_epi32 (__m128i __X) +{ + return (__m128i) __builtin_ia32_pmovsxwd128 ((__v8hi)__X); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepi8_epi64 (__m128i __X) +{ + return (__m128i) __builtin_ia32_pmovsxbq128 ((__v16qi)__X); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepi32_epi64 (__m128i __X) +{ + return (__m128i) __builtin_ia32_pmovsxdq128 ((__v4si)__X); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepi16_epi64 (__m128i __X) +{ + return (__m128i) __builtin_ia32_pmovsxwq128 ((__v8hi)__X); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepi8_epi16 (__m128i __X) +{ + return (__m128i) __builtin_ia32_pmovsxbw128 ((__v16qi)__X); +} + +/* Packed integer zero-extension. */ + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepu8_epi32 (__m128i __X) +{ + return (__m128i) __builtin_ia32_pmovzxbd128 ((__v16qi)__X); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepu16_epi32 (__m128i __X) +{ + return (__m128i) __builtin_ia32_pmovzxwd128 ((__v8hi)__X); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepu8_epi64 (__m128i __X) +{ + return (__m128i) __builtin_ia32_pmovzxbq128 ((__v16qi)__X); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepu32_epi64 (__m128i __X) +{ + return (__m128i) __builtin_ia32_pmovzxdq128 ((__v4si)__X); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepu16_epi64 (__m128i __X) +{ + return (__m128i) __builtin_ia32_pmovzxwq128 ((__v8hi)__X); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtepu8_epi16 (__m128i __X) +{ + return (__m128i) __builtin_ia32_pmovzxbw128 ((__v16qi)__X); +} + +/* Pack 8 double words from 2 operands into 8 words of result with + unsigned saturation. */ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_packus_epi32 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_packusdw128 ((__v4si)__X, (__v4si)__Y); +} + +/* Sum absolute 8-bit integer difference of adjacent groups of 4 + byte integers in the first 2 operands. Starting offsets within + operands are determined by the 3rd mask operand. */ + +#ifdef __OPTIMIZE__ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mpsadbw_epu8 (__m128i __X, __m128i __Y, const int __M) +{ + return (__m128i) __builtin_ia32_mpsadbw128 ((__v16qi)__X, + (__v16qi)__Y, __M); +} +#else +#define _mm_mpsadbw_epu8(X, Y, M) \ + ((__m128i) __builtin_ia32_mpsadbw128 ((__v16qi)(__m128i)(X), \ + (__v16qi)(__m128i)(Y), (int)(M))) +#endif + +/* Load double quadword using non-temporal aligned hint. */ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_stream_load_si128 (__m128i *__X) +{ + return (__m128i) __builtin_ia32_movntdqa ((__v2di *) __X); +} + +#ifndef __SSE4_2__ +#pragma GCC push_options +#pragma GCC target("sse4.2") +#define __DISABLE_SSE4_2__ +#endif /* __SSE4_2__ */ + +/* These macros specify the source data format. */ +#define _SIDD_UBYTE_OPS 0x00 +#define _SIDD_UWORD_OPS 0x01 +#define _SIDD_SBYTE_OPS 0x02 +#define _SIDD_SWORD_OPS 0x03 + +/* These macros specify the comparison operation. */ +#define _SIDD_CMP_EQUAL_ANY 0x00 +#define _SIDD_CMP_RANGES 0x04 +#define _SIDD_CMP_EQUAL_EACH 0x08 +#define _SIDD_CMP_EQUAL_ORDERED 0x0c + +/* These macros specify the polarity. */ +#define _SIDD_POSITIVE_POLARITY 0x00 +#define _SIDD_NEGATIVE_POLARITY 0x10 +#define _SIDD_MASKED_POSITIVE_POLARITY 0x20 +#define _SIDD_MASKED_NEGATIVE_POLARITY 0x30 + +/* These macros specify the output selection in _mm_cmpXstri (). */ +#define _SIDD_LEAST_SIGNIFICANT 0x00 +#define _SIDD_MOST_SIGNIFICANT 0x40 + +/* These macros specify the output selection in _mm_cmpXstrm (). */ +#define _SIDD_BIT_MASK 0x00 +#define _SIDD_UNIT_MASK 0x40 + +/* Intrinsics for text/string processing. */ + +#ifdef __OPTIMIZE__ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpistrm (__m128i __X, __m128i __Y, const int __M) +{ + return (__m128i) __builtin_ia32_pcmpistrm128 ((__v16qi)__X, + (__v16qi)__Y, + __M); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpistri (__m128i __X, __m128i __Y, const int __M) +{ + return __builtin_ia32_pcmpistri128 ((__v16qi)__X, + (__v16qi)__Y, + __M); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpestrm (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M) +{ + return (__m128i) __builtin_ia32_pcmpestrm128 ((__v16qi)__X, __LX, + (__v16qi)__Y, __LY, + __M); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpestri (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M) +{ + return __builtin_ia32_pcmpestri128 ((__v16qi)__X, __LX, + (__v16qi)__Y, __LY, + __M); +} +#else +#define _mm_cmpistrm(X, Y, M) \ + ((__m128i) __builtin_ia32_pcmpistrm128 ((__v16qi)(__m128i)(X), \ + (__v16qi)(__m128i)(Y), (int)(M))) +#define _mm_cmpistri(X, Y, M) \ + ((int) __builtin_ia32_pcmpistri128 ((__v16qi)(__m128i)(X), \ + (__v16qi)(__m128i)(Y), (int)(M))) + +#define _mm_cmpestrm(X, LX, Y, LY, M) \ + ((__m128i) __builtin_ia32_pcmpestrm128 ((__v16qi)(__m128i)(X), \ + (int)(LX), (__v16qi)(__m128i)(Y), \ + (int)(LY), (int)(M))) +#define _mm_cmpestri(X, LX, Y, LY, M) \ + ((int) __builtin_ia32_pcmpestri128 ((__v16qi)(__m128i)(X), (int)(LX), \ + (__v16qi)(__m128i)(Y), (int)(LY), \ + (int)(M))) +#endif + +/* Intrinsics for text/string processing and reading values of + EFlags. */ + +#ifdef __OPTIMIZE__ +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpistra (__m128i __X, __m128i __Y, const int __M) +{ + return __builtin_ia32_pcmpistria128 ((__v16qi)__X, + (__v16qi)__Y, + __M); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpistrc (__m128i __X, __m128i __Y, const int __M) +{ + return __builtin_ia32_pcmpistric128 ((__v16qi)__X, + (__v16qi)__Y, + __M); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpistro (__m128i __X, __m128i __Y, const int __M) +{ + return __builtin_ia32_pcmpistrio128 ((__v16qi)__X, + (__v16qi)__Y, + __M); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpistrs (__m128i __X, __m128i __Y, const int __M) +{ + return __builtin_ia32_pcmpistris128 ((__v16qi)__X, + (__v16qi)__Y, + __M); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpistrz (__m128i __X, __m128i __Y, const int __M) +{ + return __builtin_ia32_pcmpistriz128 ((__v16qi)__X, + (__v16qi)__Y, + __M); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpestra (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M) +{ + return __builtin_ia32_pcmpestria128 ((__v16qi)__X, __LX, + (__v16qi)__Y, __LY, + __M); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpestrc (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M) +{ + return __builtin_ia32_pcmpestric128 ((__v16qi)__X, __LX, + (__v16qi)__Y, __LY, + __M); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpestro (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M) +{ + return __builtin_ia32_pcmpestrio128 ((__v16qi)__X, __LX, + (__v16qi)__Y, __LY, + __M); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpestrs (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M) +{ + return __builtin_ia32_pcmpestris128 ((__v16qi)__X, __LX, + (__v16qi)__Y, __LY, + __M); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpestrz (__m128i __X, int __LX, __m128i __Y, int __LY, const int __M) +{ + return __builtin_ia32_pcmpestriz128 ((__v16qi)__X, __LX, + (__v16qi)__Y, __LY, + __M); +} +#else +#define _mm_cmpistra(X, Y, M) \ + ((int) __builtin_ia32_pcmpistria128 ((__v16qi)(__m128i)(X), \ + (__v16qi)(__m128i)(Y), (int)(M))) +#define _mm_cmpistrc(X, Y, M) \ + ((int) __builtin_ia32_pcmpistric128 ((__v16qi)(__m128i)(X), \ + (__v16qi)(__m128i)(Y), (int)(M))) +#define _mm_cmpistro(X, Y, M) \ + ((int) __builtin_ia32_pcmpistrio128 ((__v16qi)(__m128i)(X), \ + (__v16qi)(__m128i)(Y), (int)(M))) +#define _mm_cmpistrs(X, Y, M) \ + ((int) __builtin_ia32_pcmpistris128 ((__v16qi)(__m128i)(X), \ + (__v16qi)(__m128i)(Y), (int)(M))) +#define _mm_cmpistrz(X, Y, M) \ + ((int) __builtin_ia32_pcmpistriz128 ((__v16qi)(__m128i)(X), \ + (__v16qi)(__m128i)(Y), (int)(M))) + +#define _mm_cmpestra(X, LX, Y, LY, M) \ + ((int) __builtin_ia32_pcmpestria128 ((__v16qi)(__m128i)(X), (int)(LX), \ + (__v16qi)(__m128i)(Y), (int)(LY), \ + (int)(M))) +#define _mm_cmpestrc(X, LX, Y, LY, M) \ + ((int) __builtin_ia32_pcmpestric128 ((__v16qi)(__m128i)(X), (int)(LX), \ + (__v16qi)(__m128i)(Y), (int)(LY), \ + (int)(M))) +#define _mm_cmpestro(X, LX, Y, LY, M) \ + ((int) __builtin_ia32_pcmpestrio128 ((__v16qi)(__m128i)(X), (int)(LX), \ + (__v16qi)(__m128i)(Y), (int)(LY), \ + (int)(M))) +#define _mm_cmpestrs(X, LX, Y, LY, M) \ + ((int) __builtin_ia32_pcmpestris128 ((__v16qi)(__m128i)(X), (int)(LX), \ + (__v16qi)(__m128i)(Y), (int)(LY), \ + (int)(M))) +#define _mm_cmpestrz(X, LX, Y, LY, M) \ + ((int) __builtin_ia32_pcmpestriz128 ((__v16qi)(__m128i)(X), (int)(LX), \ + (__v16qi)(__m128i)(Y), (int)(LY), \ + (int)(M))) +#endif + +/* Packed integer 64-bit comparison, zeroing or filling with ones + corresponding parts of result. */ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpgt_epi64 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_pcmpgtq ((__v2di)__X, (__v2di)__Y); +} + +#ifdef __DISABLE_SSE4_2__ +#undef __DISABLE_SSE4_2__ +#pragma GCC pop_options +#endif /* __DISABLE_SSE4_2__ */ + +#ifdef __DISABLE_SSE4_1__ +#undef __DISABLE_SSE4_1__ +#pragma GCC pop_options +#endif /* __DISABLE_SSE4_1__ */ + +#include + +#ifndef __SSE4_1__ +#pragma GCC push_options +#pragma GCC target("sse4.1") +#define __DISABLE_SSE4_1__ +#endif /* __SSE4_1__ */ + +#ifndef __SSE4_2__ +#pragma GCC push_options +#pragma GCC target("sse4.2") +#define __DISABLE_SSE4_2__ +#endif /* __SSE4_1__ */ + +/* Accumulate CRC32 (polynomial 0x11EDC6F41) value. */ +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_crc32_u8 (unsigned int __C, unsigned char __V) +{ + return __builtin_ia32_crc32qi (__C, __V); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_crc32_u16 (unsigned int __C, unsigned short __V) +{ + return __builtin_ia32_crc32hi (__C, __V); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_crc32_u32 (unsigned int __C, unsigned int __V) +{ + return __builtin_ia32_crc32si (__C, __V); +} + +#ifdef __x86_64__ +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_crc32_u64 (unsigned long long __C, unsigned long long __V) +{ + return __builtin_ia32_crc32di (__C, __V); +} +#endif + +#ifdef __DISABLE_SSE4_2__ +#undef __DISABLE_SSE4_2__ +#pragma GCC pop_options +#endif /* __DISABLE_SSE4_2__ */ + +#ifdef __DISABLE_SSE4_1__ +#undef __DISABLE_SSE4_1__ +#pragma GCC pop_options +#endif /* __DISABLE_SSE4_1__ */ + +#endif /* _SMMINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/stdalign.h b/lib/gcc/x86_64-linux-android/4.9.x/include/stdalign.h new file mode 100644 index 0000000..ee2d81f --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/stdalign.h @@ -0,0 +1,39 @@ +/* Copyright (C) 2011-2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +. */ + +/* ISO C1X: 7.15 Alignment . */ + +#ifndef _STDALIGN_H +#define _STDALIGN_H + +#ifndef __cplusplus + +#define alignas _Alignas +#define alignof _Alignof + +#define __alignas_is_defined 1 +#define __alignof_is_defined 1 + +#endif + +#endif /* stdalign.h */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/stdarg.h b/lib/gcc/x86_64-linux-android/4.9.x/include/stdarg.h new file mode 100644 index 0000000..1d4418b --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/stdarg.h @@ -0,0 +1,126 @@ +/* Copyright (C) 1989-2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +. */ + +/* + * ISO C Standard: 7.15 Variable arguments + */ + +#ifndef _STDARG_H +#ifndef _ANSI_STDARG_H_ +#ifndef __need___va_list +#define _STDARG_H +#define _ANSI_STDARG_H_ +#endif /* not __need___va_list */ +#undef __need___va_list + +/* Define __gnuc_va_list. */ + +#ifndef __GNUC_VA_LIST +#define __GNUC_VA_LIST +typedef __builtin_va_list __gnuc_va_list; +#endif + +/* Define the standard macros for the user, + if this invocation was from the user program. */ +#ifdef _STDARG_H + +#define va_start(v,l) __builtin_va_start(v,l) +#define va_end(v) __builtin_va_end(v) +#define va_arg(v,l) __builtin_va_arg(v,l) +#if !defined(__STRICT_ANSI__) || __STDC_VERSION__ + 0 >= 199900L || defined(__GXX_EXPERIMENTAL_CXX0X__) +#define va_copy(d,s) __builtin_va_copy(d,s) +#endif +#define __va_copy(d,s) __builtin_va_copy(d,s) + +/* Define va_list, if desired, from __gnuc_va_list. */ +/* We deliberately do not define va_list when called from + stdio.h, because ANSI C says that stdio.h is not supposed to define + va_list. stdio.h needs to have access to that data type, + but must not use that name. It should use the name __gnuc_va_list, + which is safe because it is reserved for the implementation. */ + +#ifdef _BSD_VA_LIST +#undef _BSD_VA_LIST +#endif + +#if defined(__svr4__) || (defined(_SCO_DS) && !defined(__VA_LIST)) +/* SVR4.2 uses _VA_LIST for an internal alias for va_list, + so we must avoid testing it and setting it here. + SVR4 uses _VA_LIST as a flag in stdarg.h, but we should + have no conflict with that. */ +#ifndef _VA_LIST_ +#define _VA_LIST_ +#ifdef __i860__ +#ifndef _VA_LIST +#define _VA_LIST va_list +#endif +#endif /* __i860__ */ +typedef __gnuc_va_list va_list; +#ifdef _SCO_DS +#define __VA_LIST +#endif +#endif /* _VA_LIST_ */ +#else /* not __svr4__ || _SCO_DS */ + +/* The macro _VA_LIST_ is the same thing used by this file in Ultrix. + But on BSD NET2 we must not test or define or undef it. + (Note that the comments in NET 2's ansi.h + are incorrect for _VA_LIST_--see stdio.h!) */ +#if !defined (_VA_LIST_) || defined (__BSD_NET2__) || defined (____386BSD____) || defined (__bsdi__) || defined (__sequent__) || defined (__FreeBSD__) || defined(WINNT) +/* The macro _VA_LIST_DEFINED is used in Windows NT 3.5 */ +#ifndef _VA_LIST_DEFINED +/* The macro _VA_LIST is used in SCO Unix 3.2. */ +#ifndef _VA_LIST +/* The macro _VA_LIST_T_H is used in the Bull dpx2 */ +#ifndef _VA_LIST_T_H +/* The macro __va_list__ is used by BeOS. */ +#ifndef __va_list__ +typedef __gnuc_va_list va_list; +#endif /* not __va_list__ */ +#endif /* not _VA_LIST_T_H */ +#endif /* not _VA_LIST */ +#endif /* not _VA_LIST_DEFINED */ +#if !(defined (__BSD_NET2__) || defined (____386BSD____) || defined (__bsdi__) || defined (__sequent__) || defined (__FreeBSD__)) +#define _VA_LIST_ +#endif +#ifndef _VA_LIST +#define _VA_LIST +#endif +#ifndef _VA_LIST_DEFINED +#define _VA_LIST_DEFINED +#endif +#ifndef _VA_LIST_T_H +#define _VA_LIST_T_H +#endif +#ifndef __va_list__ +#define __va_list__ +#endif + +#endif /* not _VA_LIST_, except on certain systems */ + +#endif /* not __svr4__ */ + +#endif /* _STDARG_H */ + +#endif /* not _ANSI_STDARG_H_ */ +#endif /* not _STDARG_H */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/stdatomic.h b/lib/gcc/x86_64-linux-android/4.9.x/include/stdatomic.h new file mode 100644 index 0000000..108259b --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/stdatomic.h @@ -0,0 +1,252 @@ +/* Copyright (C) 2013-2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +. */ + +/* ISO C11 Standard: 7.17 Atomics . */ + +#ifndef _STDATOMIC_H +#define _STDATOMIC_H + +typedef enum + { + memory_order_relaxed = __ATOMIC_RELAXED, + memory_order_consume = __ATOMIC_CONSUME, + memory_order_acquire = __ATOMIC_ACQUIRE, + memory_order_release = __ATOMIC_RELEASE, + memory_order_acq_rel = __ATOMIC_ACQ_REL, + memory_order_seq_cst = __ATOMIC_SEQ_CST + } memory_order; + + +typedef _Atomic _Bool atomic_bool; +typedef _Atomic char atomic_char; +typedef _Atomic signed char atomic_schar; +typedef _Atomic unsigned char atomic_uchar; +typedef _Atomic short atomic_short; +typedef _Atomic unsigned short atomic_ushort; +typedef _Atomic int atomic_int; +typedef _Atomic unsigned int atomic_uint; +typedef _Atomic long atomic_long; +typedef _Atomic unsigned long atomic_ulong; +typedef _Atomic long long atomic_llong; +typedef _Atomic unsigned long long atomic_ullong; +typedef _Atomic __CHAR16_TYPE__ atomic_char16_t; +typedef _Atomic __CHAR32_TYPE__ atomic_char32_t; +typedef _Atomic __WCHAR_TYPE__ atomic_wchar_t; +typedef _Atomic __INT_LEAST8_TYPE__ atomic_int_least8_t; +typedef _Atomic __UINT_LEAST8_TYPE__ atomic_uint_least8_t; +typedef _Atomic __INT_LEAST16_TYPE__ atomic_int_least16_t; +typedef _Atomic __UINT_LEAST16_TYPE__ atomic_uint_least16_t; +typedef _Atomic __INT_LEAST32_TYPE__ atomic_int_least32_t; +typedef _Atomic __UINT_LEAST32_TYPE__ atomic_uint_least32_t; +typedef _Atomic __INT_LEAST64_TYPE__ atomic_int_least64_t; +typedef _Atomic __UINT_LEAST64_TYPE__ atomic_uint_least64_t; +typedef _Atomic __INT_FAST8_TYPE__ atomic_int_fast8_t; +typedef _Atomic __UINT_FAST8_TYPE__ atomic_uint_fast8_t; +typedef _Atomic __INT_FAST16_TYPE__ atomic_int_fast16_t; +typedef _Atomic __UINT_FAST16_TYPE__ atomic_uint_fast16_t; +typedef _Atomic __INT_FAST32_TYPE__ atomic_int_fast32_t; +typedef _Atomic __UINT_FAST32_TYPE__ atomic_uint_fast32_t; +typedef _Atomic __INT_FAST64_TYPE__ atomic_int_fast64_t; +typedef _Atomic __UINT_FAST64_TYPE__ atomic_uint_fast64_t; +typedef _Atomic __INTPTR_TYPE__ atomic_intptr_t; +typedef _Atomic __UINTPTR_TYPE__ atomic_uintptr_t; +typedef _Atomic __SIZE_TYPE__ atomic_size_t; +typedef _Atomic __PTRDIFF_TYPE__ atomic_ptrdiff_t; +typedef _Atomic __INTMAX_TYPE__ atomic_intmax_t; +typedef _Atomic __UINTMAX_TYPE__ atomic_uintmax_t; + + +#define ATOMIC_VAR_INIT(VALUE) (VALUE) +#define atomic_init(PTR, VAL) \ + do \ + { \ + *(PTR) = (VAL); \ + } \ + while (0) + +#define kill_dependency(Y) \ + __extension__ \ + ({ \ + __auto_type __kill_dependency_tmp = (Y); \ + __kill_dependency_tmp; \ + }) + +#define atomic_thread_fence(MO) __atomic_thread_fence (MO) +#define atomic_signal_fence(MO) __atomic_signal_fence (MO) +#define atomic_is_lock_free(OBJ) __atomic_is_lock_free (sizeof (*(OBJ)), (OBJ)) + +#define __atomic_type_lock_free(T) \ + (__atomic_always_lock_free (sizeof (T), (void *) 0) \ + ? 2 \ + : (__atomic_is_lock_free (sizeof (T), (void *) 0) ? 1 : 0)) +#define ATOMIC_BOOL_LOCK_FREE \ + __atomic_type_lock_free (atomic_bool) +#define ATOMIC_CHAR_LOCK_FREE \ + __atomic_type_lock_free (atomic_char) +#define ATOMIC_CHAR16_T_LOCK_FREE \ + __atomic_type_lock_free (atomic_char16_t) +#define ATOMIC_CHAR32_T_LOCK_FREE \ + __atomic_type_lock_free (atomic_char32_t) +#define ATOMIC_WCHAR_T_LOCK_FREE \ + __atomic_type_lock_free (atomic_wchar_t) +#define ATOMIC_SHORT_LOCK_FREE \ + __atomic_type_lock_free (atomic_short) +#define ATOMIC_INT_LOCK_FREE \ + __atomic_type_lock_free (atomic_int) +#define ATOMIC_LONG_LOCK_FREE \ + __atomic_type_lock_free (atomic_long) +#define ATOMIC_LLONG_LOCK_FREE \ + __atomic_type_lock_free (atomic_llong) +#define ATOMIC_POINTER_LOCK_FREE \ + __atomic_type_lock_free (void * _Atomic) + + +/* Note that these macros require __typeof__ and __auto_type to remove + _Atomic qualifiers (and const qualifiers, if those are valid on + macro operands). + + Also note that the header file uses the generic form of __atomic + builtins, which requires the address to be taken of the value + parameter, and then we pass that value on. This allows the macros + to work for any type, and the compiler is smart enough to convert + these to lock-free _N variants if possible, and throw away the + temps. */ + +#define atomic_store_explicit(PTR, VAL, MO) \ + __extension__ \ + ({ \ + __auto_type __atomic_store_ptr = (PTR); \ + __typeof__ (*__atomic_store_ptr) __atomic_store_tmp = (VAL); \ + __atomic_store (__atomic_store_ptr, &__atomic_store_tmp, (MO)); \ + }) + +#define atomic_store(PTR, VAL) \ + atomic_store_explicit (PTR, VAL, __ATOMIC_SEQ_CST) + + +#define atomic_load_explicit(PTR, MO) \ + __extension__ \ + ({ \ + __auto_type __atomic_load_ptr = (PTR); \ + __typeof__ (*__atomic_load_ptr) __atomic_load_tmp; \ + __atomic_load (__atomic_load_ptr, &__atomic_load_tmp, (MO)); \ + __atomic_load_tmp; \ + }) + +#define atomic_load(PTR) atomic_load_explicit (PTR, __ATOMIC_SEQ_CST) + + +#define atomic_exchange_explicit(PTR, VAL, MO) \ + __extension__ \ + ({ \ + __auto_type __atomic_exchange_ptr = (PTR); \ + __typeof__ (*__atomic_exchange_ptr) __atomic_exchange_val = (VAL); \ + __typeof__ (*__atomic_exchange_ptr) __atomic_exchange_tmp; \ + __atomic_exchange (__atomic_exchange_ptr, &__atomic_exchange_val, \ + &__atomic_exchange_tmp, (MO)); \ + __atomic_exchange_tmp; \ + }) + +#define atomic_exchange(PTR, VAL) \ + atomic_exchange_explicit (PTR, VAL, __ATOMIC_SEQ_CST) + + +#define atomic_compare_exchange_strong_explicit(PTR, VAL, DES, SUC, FAIL) \ + __extension__ \ + ({ \ + __auto_type __atomic_compare_exchange_ptr = (PTR); \ + __typeof__ (*__atomic_compare_exchange_ptr) __atomic_compare_exchange_tmp \ + = (DES); \ + __atomic_compare_exchange (__atomic_compare_exchange_ptr, (VAL), \ + &__atomic_compare_exchange_tmp, 0, \ + (SUC), (FAIL)); \ + }) + +#define atomic_compare_exchange_strong(PTR, VAL, DES) \ + atomic_compare_exchange_strong_explicit (PTR, VAL, DES, __ATOMIC_SEQ_CST, \ + __ATOMIC_SEQ_CST) + +#define atomic_compare_exchange_weak_explicit(PTR, VAL, DES, SUC, FAIL) \ + __extension__ \ + ({ \ + __auto_type __atomic_compare_exchange_ptr = (PTR); \ + __typeof__ (*__atomic_compare_exchange_ptr) __atomic_compare_exchange_tmp \ + = (DES); \ + __atomic_compare_exchange (__atomic_compare_exchange_ptr, (VAL), \ + &__atomic_compare_exchange_tmp, 1, \ + (SUC), (FAIL)); \ + }) + +#define atomic_compare_exchange_weak(PTR, VAL, DES) \ + atomic_compare_exchange_weak_explicit (PTR, VAL, DES, __ATOMIC_SEQ_CST, \ + __ATOMIC_SEQ_CST) + + + +#define atomic_fetch_add(PTR, VAL) __atomic_fetch_add ((PTR), (VAL), \ + __ATOMIC_SEQ_CST) +#define atomic_fetch_add_explicit(PTR, VAL, MO) \ + __atomic_fetch_add ((PTR), (VAL), (MO)) + +#define atomic_fetch_sub(PTR, VAL) __atomic_fetch_sub ((PTR), (VAL), \ + __ATOMIC_SEQ_CST) +#define atomic_fetch_sub_explicit(PTR, VAL, MO) \ + __atomic_fetch_sub ((PTR), (VAL), (MO)) + +#define atomic_fetch_or(PTR, VAL) __atomic_fetch_or ((PTR), (VAL), \ + __ATOMIC_SEQ_CST) +#define atomic_fetch_or_explicit(PTR, VAL, MO) \ + __atomic_fetch_or ((PTR), (VAL), (MO)) + +#define atomic_fetch_xor(PTR, VAL) __atomic_fetch_xor ((PTR), (VAL), \ + __ATOMIC_SEQ_CST) +#define atomic_fetch_xor_explicit(PTR, VAL, MO) \ + __atomic_fetch_xor ((PTR), (VAL), (MO)) + +#define atomic_fetch_and(PTR, VAL) __atomic_fetch_and ((PTR), (VAL), \ + __ATOMIC_SEQ_CST) +#define atomic_fetch_and_explicit(PTR, VAL, MO) \ + __atomic_fetch_and ((PTR), (VAL), (MO)) + + +typedef _Atomic struct +{ +#if __GCC_ATOMIC_TEST_AND_SET_TRUEVAL == 1 + _Bool __val; +#else + unsigned char __val; +#endif +} atomic_flag; + +#define ATOMIC_FLAG_INIT { 0 } + + +#define atomic_flag_test_and_set(PTR) \ + __atomic_test_and_set ((PTR), __ATOMIC_SEQ_CST) +#define atomic_flag_test_and_set_explicit(PTR, MO) \ + __atomic_test_and_set ((PTR), (MO)) + +#define atomic_flag_clear(PTR) __atomic_clear ((PTR), __ATOMIC_SEQ_CST) +#define atomic_flag_clear_explicit(PTR, MO) __atomic_clear ((PTR), (MO)) + +#endif /* _STDATOMIC_H */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/stdbool.h b/lib/gcc/x86_64-linux-android/4.9.x/include/stdbool.h new file mode 100644 index 0000000..f4e802f --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/stdbool.h @@ -0,0 +1,50 @@ +/* Copyright (C) 1998-2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +. */ + +/* + * ISO C Standard: 7.16 Boolean type and values + */ + +#ifndef _STDBOOL_H +#define _STDBOOL_H + +#ifndef __cplusplus + +#define bool _Bool +#define true 1 +#define false 0 + +#else /* __cplusplus */ + +/* Supporting in C++ is a GCC extension. */ +#define _Bool bool +#define bool bool +#define false false +#define true true + +#endif /* __cplusplus */ + +/* Signal that all the definitions are present. */ +#define __bool_true_false_are_defined 1 + +#endif /* stdbool.h */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/stddef.h b/lib/gcc/x86_64-linux-android/4.9.x/include/stddef.h new file mode 100644 index 0000000..cfa8df3 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/stddef.h @@ -0,0 +1,439 @@ +/* Copyright (C) 1989-2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +. */ + +/* + * ISO C Standard: 7.17 Common definitions + */ +#if (!defined(_STDDEF_H) && !defined(_STDDEF_H_) && !defined(_ANSI_STDDEF_H) \ + && !defined(__STDDEF_H__)) \ + || defined(__need_wchar_t) || defined(__need_size_t) \ + || defined(__need_ptrdiff_t) || defined(__need_NULL) \ + || defined(__need_wint_t) + +/* Any one of these symbols __need_* means that GNU libc + wants us just to define one data type. So don't define + the symbols that indicate this file's entire job has been done. */ +#if (!defined(__need_wchar_t) && !defined(__need_size_t) \ + && !defined(__need_ptrdiff_t) && !defined(__need_NULL) \ + && !defined(__need_wint_t)) +#define _STDDEF_H +#define _STDDEF_H_ +/* snaroff@next.com says the NeXT needs this. */ +#define _ANSI_STDDEF_H +#endif + +#ifndef __sys_stdtypes_h +/* This avoids lossage on SunOS but only if stdtypes.h comes first. + There's no way to win with the other order! Sun lossage. */ + +/* On 4.3bsd-net2, make sure ansi.h is included, so we have + one less case to deal with in the following. */ +#if defined (__BSD_NET2__) || defined (____386BSD____) || (defined (__FreeBSD__) && (__FreeBSD__ < 5)) || defined(__NetBSD__) +#include +#endif +/* On FreeBSD 5, machine/ansi.h does not exist anymore... */ +#if defined (__FreeBSD__) && (__FreeBSD__ >= 5) +#include +#endif + +/* In 4.3bsd-net2, machine/ansi.h defines these symbols, which are + defined if the corresponding type is *not* defined. + FreeBSD-2.1 defines _MACHINE_ANSI_H_ instead of _ANSI_H_. + NetBSD defines _I386_ANSI_H_ and _X86_64_ANSI_H_ instead of _ANSI_H_ */ +#if defined(_ANSI_H_) || defined(_MACHINE_ANSI_H_) || defined(_X86_64_ANSI_H_) || defined(_I386_ANSI_H_) +#if !defined(_SIZE_T_) && !defined(_BSD_SIZE_T_) +#define _SIZE_T +#endif +#if !defined(_PTRDIFF_T_) && !defined(_BSD_PTRDIFF_T_) +#define _PTRDIFF_T +#endif +/* On BSD/386 1.1, at least, machine/ansi.h defines _BSD_WCHAR_T_ + instead of _WCHAR_T_. */ +#if !defined(_WCHAR_T_) && !defined(_BSD_WCHAR_T_) +#ifndef _BSD_WCHAR_T_ +#define _WCHAR_T +#endif +#endif +/* Undef _FOO_T_ if we are supposed to define foo_t. */ +#if defined (__need_ptrdiff_t) || defined (_STDDEF_H_) +#undef _PTRDIFF_T_ +#undef _BSD_PTRDIFF_T_ +#endif +#if defined (__need_size_t) || defined (_STDDEF_H_) +#undef _SIZE_T_ +#undef _BSD_SIZE_T_ +#endif +#if defined (__need_wchar_t) || defined (_STDDEF_H_) +#undef _WCHAR_T_ +#undef _BSD_WCHAR_T_ +#endif +#endif /* defined(_ANSI_H_) || defined(_MACHINE_ANSI_H_) || defined(_X86_64_ANSI_H_) || defined(_I386_ANSI_H_) */ + +/* Sequent's header files use _PTRDIFF_T_ in some conflicting way. + Just ignore it. */ +#if defined (__sequent__) && defined (_PTRDIFF_T_) +#undef _PTRDIFF_T_ +#endif + +/* On VxWorks, may have defined macros like + _TYPE_size_t which will typedef size_t. fixincludes patched the + vxTypesBase.h so that this macro is only defined if _GCC_SIZE_T is + not defined, and so that defining this macro defines _GCC_SIZE_T. + If we find that the macros are still defined at this point, we must + invoke them so that the type is defined as expected. */ +#if defined (_TYPE_ptrdiff_t) && (defined (__need_ptrdiff_t) || defined (_STDDEF_H_)) +_TYPE_ptrdiff_t; +#undef _TYPE_ptrdiff_t +#endif +#if defined (_TYPE_size_t) && (defined (__need_size_t) || defined (_STDDEF_H_)) +_TYPE_size_t; +#undef _TYPE_size_t +#endif +#if defined (_TYPE_wchar_t) && (defined (__need_wchar_t) || defined (_STDDEF_H_)) +_TYPE_wchar_t; +#undef _TYPE_wchar_t +#endif + +/* In case nobody has defined these types, but we aren't running under + GCC 2.00, make sure that __PTRDIFF_TYPE__, __SIZE_TYPE__, and + __WCHAR_TYPE__ have reasonable values. This can happen if the + parts of GCC is compiled by an older compiler, that actually + include gstddef.h, such as collect2. */ + +/* Signed type of difference of two pointers. */ + +/* Define this type if we are doing the whole job, + or if we want this type in particular. */ +#if defined (_STDDEF_H) || defined (__need_ptrdiff_t) +#ifndef _PTRDIFF_T /* in case has defined it. */ +#ifndef _T_PTRDIFF_ +#ifndef _T_PTRDIFF +#ifndef __PTRDIFF_T +#ifndef _PTRDIFF_T_ +#ifndef _BSD_PTRDIFF_T_ +#ifndef ___int_ptrdiff_t_h +#ifndef _GCC_PTRDIFF_T +#define _PTRDIFF_T +#define _T_PTRDIFF_ +#define _T_PTRDIFF +#define __PTRDIFF_T +#define _PTRDIFF_T_ +#define _BSD_PTRDIFF_T_ +#define ___int_ptrdiff_t_h +#define _GCC_PTRDIFF_T +#ifndef __PTRDIFF_TYPE__ +#define __PTRDIFF_TYPE__ long int +#endif +typedef __PTRDIFF_TYPE__ ptrdiff_t; +#endif /* _GCC_PTRDIFF_T */ +#endif /* ___int_ptrdiff_t_h */ +#endif /* _BSD_PTRDIFF_T_ */ +#endif /* _PTRDIFF_T_ */ +#endif /* __PTRDIFF_T */ +#endif /* _T_PTRDIFF */ +#endif /* _T_PTRDIFF_ */ +#endif /* _PTRDIFF_T */ + +/* If this symbol has done its job, get rid of it. */ +#undef __need_ptrdiff_t + +#endif /* _STDDEF_H or __need_ptrdiff_t. */ + +/* Unsigned type of `sizeof' something. */ + +/* Define this type if we are doing the whole job, + or if we want this type in particular. */ +#if defined (_STDDEF_H) || defined (__need_size_t) +#ifndef __size_t__ /* BeOS */ +#ifndef __SIZE_T__ /* Cray Unicos/Mk */ +#ifndef _SIZE_T /* in case has defined it. */ +#ifndef _SYS_SIZE_T_H +#ifndef _T_SIZE_ +#ifndef _T_SIZE +#ifndef __SIZE_T +#ifndef _SIZE_T_ +#ifndef _BSD_SIZE_T_ +#ifndef _SIZE_T_DEFINED_ +#ifndef _SIZE_T_DEFINED +#ifndef _BSD_SIZE_T_DEFINED_ /* Darwin */ +#ifndef _SIZE_T_DECLARED /* FreeBSD 5 */ +#ifndef ___int_size_t_h +#ifndef _GCC_SIZE_T +#ifndef _SIZET_ +#ifndef __size_t +#define __size_t__ /* BeOS */ +#define __SIZE_T__ /* Cray Unicos/Mk */ +#define _SIZE_T +#define _SYS_SIZE_T_H +#define _T_SIZE_ +#define _T_SIZE +#define __SIZE_T +#define _SIZE_T_ +#define _BSD_SIZE_T_ +#define _SIZE_T_DEFINED_ +#define _SIZE_T_DEFINED +#define _BSD_SIZE_T_DEFINED_ /* Darwin */ +#define _SIZE_T_DECLARED /* FreeBSD 5 */ +#define ___int_size_t_h +#define _GCC_SIZE_T +#define _SIZET_ +#if (defined (__FreeBSD__) && (__FreeBSD__ >= 5)) \ + || defined(__FreeBSD_kernel__) +/* __size_t is a typedef on FreeBSD 5, must not trash it. */ +#elif defined (__VMS__) +/* __size_t is also a typedef on VMS. */ +#else +#define __size_t +#endif +#ifndef __SIZE_TYPE__ +#define __SIZE_TYPE__ long unsigned int +#endif +#if !(defined (__GNUG__) && defined (size_t)) +typedef __SIZE_TYPE__ size_t; +#ifdef __BEOS__ +typedef long ssize_t; +#endif /* __BEOS__ */ +#endif /* !(defined (__GNUG__) && defined (size_t)) */ +#endif /* __size_t */ +#endif /* _SIZET_ */ +#endif /* _GCC_SIZE_T */ +#endif /* ___int_size_t_h */ +#endif /* _SIZE_T_DECLARED */ +#endif /* _BSD_SIZE_T_DEFINED_ */ +#endif /* _SIZE_T_DEFINED */ +#endif /* _SIZE_T_DEFINED_ */ +#endif /* _BSD_SIZE_T_ */ +#endif /* _SIZE_T_ */ +#endif /* __SIZE_T */ +#endif /* _T_SIZE */ +#endif /* _T_SIZE_ */ +#endif /* _SYS_SIZE_T_H */ +#endif /* _SIZE_T */ +#endif /* __SIZE_T__ */ +#endif /* __size_t__ */ +#undef __need_size_t +#endif /* _STDDEF_H or __need_size_t. */ + + +/* Wide character type. + Locale-writers should change this as necessary to + be big enough to hold unique values not between 0 and 127, + and not (wchar_t) -1, for each defined multibyte character. */ + +/* Define this type if we are doing the whole job, + or if we want this type in particular. */ +#if defined (_STDDEF_H) || defined (__need_wchar_t) +#ifndef __wchar_t__ /* BeOS */ +#ifndef __WCHAR_T__ /* Cray Unicos/Mk */ +#ifndef _WCHAR_T +#ifndef _T_WCHAR_ +#ifndef _T_WCHAR +#ifndef __WCHAR_T +#ifndef _WCHAR_T_ +#ifndef _BSD_WCHAR_T_ +#ifndef _BSD_WCHAR_T_DEFINED_ /* Darwin */ +#ifndef _BSD_RUNE_T_DEFINED_ /* Darwin */ +#ifndef _WCHAR_T_DECLARED /* FreeBSD 5 */ +#ifndef _WCHAR_T_DEFINED_ +#ifndef _WCHAR_T_DEFINED +#ifndef _WCHAR_T_H +#ifndef ___int_wchar_t_h +#ifndef __INT_WCHAR_T_H +#ifndef _GCC_WCHAR_T +#define __wchar_t__ /* BeOS */ +#define __WCHAR_T__ /* Cray Unicos/Mk */ +#define _WCHAR_T +#define _T_WCHAR_ +#define _T_WCHAR +#define __WCHAR_T +#define _WCHAR_T_ +#define _BSD_WCHAR_T_ +#define _WCHAR_T_DEFINED_ +#define _WCHAR_T_DEFINED +#define _WCHAR_T_H +#define ___int_wchar_t_h +#define __INT_WCHAR_T_H +#define _GCC_WCHAR_T +#define _WCHAR_T_DECLARED + +/* On BSD/386 1.1, at least, machine/ansi.h defines _BSD_WCHAR_T_ + instead of _WCHAR_T_, and _BSD_RUNE_T_ (which, unlike the other + symbols in the _FOO_T_ family, stays defined even after its + corresponding type is defined). If we define wchar_t, then we + must undef _WCHAR_T_; for BSD/386 1.1 (and perhaps others), if + we undef _WCHAR_T_, then we must also define rune_t, since + headers like runetype.h assume that if machine/ansi.h is included, + and _BSD_WCHAR_T_ is not defined, then rune_t is available. + machine/ansi.h says, "Note that _WCHAR_T_ and _RUNE_T_ must be of + the same type." */ +#ifdef _BSD_WCHAR_T_ +#undef _BSD_WCHAR_T_ +#ifdef _BSD_RUNE_T_ +#if !defined (_ANSI_SOURCE) && !defined (_POSIX_SOURCE) +typedef _BSD_RUNE_T_ rune_t; +#define _BSD_WCHAR_T_DEFINED_ +#define _BSD_RUNE_T_DEFINED_ /* Darwin */ +#if defined (__FreeBSD__) && (__FreeBSD__ < 5) +/* Why is this file so hard to maintain properly? In contrast to + the comment above regarding BSD/386 1.1, on FreeBSD for as long + as the symbol has existed, _BSD_RUNE_T_ must not stay defined or + redundant typedefs will occur when stdlib.h is included after this file. */ +#undef _BSD_RUNE_T_ +#endif +#endif +#endif +#endif +/* FreeBSD 5 can't be handled well using "traditional" logic above + since it no longer defines _BSD_RUNE_T_ yet still desires to export + rune_t in some cases... */ +#if defined (__FreeBSD__) && (__FreeBSD__ >= 5) +#if !defined (_ANSI_SOURCE) && !defined (_POSIX_SOURCE) +#if __BSD_VISIBLE +#ifndef _RUNE_T_DECLARED +typedef __rune_t rune_t; +#define _RUNE_T_DECLARED +#endif +#endif +#endif +#endif + +#ifndef __WCHAR_TYPE__ +#define __WCHAR_TYPE__ int +#endif +#ifndef __cplusplus +typedef __WCHAR_TYPE__ wchar_t; +#endif +#endif +#endif +#endif +#endif +#endif +#endif +#endif /* _WCHAR_T_DECLARED */ +#endif /* _BSD_RUNE_T_DEFINED_ */ +#endif +#endif +#endif +#endif +#endif +#endif +#endif +#endif /* __WCHAR_T__ */ +#endif /* __wchar_t__ */ +#undef __need_wchar_t +#endif /* _STDDEF_H or __need_wchar_t. */ + +#if defined (__need_wint_t) +#ifndef _WINT_T +#define _WINT_T + +#ifndef __WINT_TYPE__ +#define __WINT_TYPE__ unsigned int +#endif +typedef __WINT_TYPE__ wint_t; +#endif +#undef __need_wint_t +#endif + +/* In 4.3bsd-net2, leave these undefined to indicate that size_t, etc. + are already defined. */ +/* BSD/OS 3.1 and FreeBSD [23].x require the MACHINE_ANSI_H check here. */ +/* NetBSD 5 requires the I386_ANSI_H and X86_64_ANSI_H checks here. */ +#if defined(_ANSI_H_) || defined(_MACHINE_ANSI_H_) || defined(_X86_64_ANSI_H_) || defined(_I386_ANSI_H_) +/* The references to _GCC_PTRDIFF_T_, _GCC_SIZE_T_, and _GCC_WCHAR_T_ + are probably typos and should be removed before 2.8 is released. */ +#ifdef _GCC_PTRDIFF_T_ +#undef _PTRDIFF_T_ +#undef _BSD_PTRDIFF_T_ +#endif +#ifdef _GCC_SIZE_T_ +#undef _SIZE_T_ +#undef _BSD_SIZE_T_ +#endif +#ifdef _GCC_WCHAR_T_ +#undef _WCHAR_T_ +#undef _BSD_WCHAR_T_ +#endif +/* The following ones are the real ones. */ +#ifdef _GCC_PTRDIFF_T +#undef _PTRDIFF_T_ +#undef _BSD_PTRDIFF_T_ +#endif +#ifdef _GCC_SIZE_T +#undef _SIZE_T_ +#undef _BSD_SIZE_T_ +#endif +#ifdef _GCC_WCHAR_T +#undef _WCHAR_T_ +#undef _BSD_WCHAR_T_ +#endif +#endif /* _ANSI_H_ || _MACHINE_ANSI_H_ || _X86_64_ANSI_H_ || _I386_ANSI_H_ */ + +#endif /* __sys_stdtypes_h */ + +/* A null pointer constant. */ + +#if defined (_STDDEF_H) || defined (__need_NULL) +#undef NULL /* in case has defined it. */ +#ifdef __GNUG__ +#define NULL __null +#else /* G++ */ +#ifndef __cplusplus +#define NULL ((void *)0) +#else /* C++ */ +#define NULL 0 +#endif /* C++ */ +#endif /* G++ */ +#endif /* NULL not defined and or need NULL. */ +#undef __need_NULL + +#ifdef _STDDEF_H + +/* Offset of member MEMBER in a struct of type TYPE. */ +#define offsetof(TYPE, MEMBER) __builtin_offsetof (TYPE, MEMBER) + +#if (defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L) \ + || (defined(__cplusplus) && __cplusplus >= 201103L) +#ifndef _GCC_MAX_ALIGN_T +#define _GCC_MAX_ALIGN_T +/* Type whose alignment is supported in every context and is at least + as great as that of any standard type not using alignment + specifiers. */ +typedef struct { + long long __max_align_ll __attribute__((__aligned__(__alignof__(long long)))); + long double __max_align_ld __attribute__((__aligned__(__alignof__(long double)))); +} max_align_t; +#endif +#endif /* C11 or C++11. */ + +#if defined(__cplusplus) && __cplusplus >= 201103L +#ifndef _GXX_NULLPTR_T +#define _GXX_NULLPTR_T + typedef decltype(nullptr) nullptr_t; +#endif +#endif /* C++11. */ + +#endif /* _STDDEF_H was defined this time */ + +#endif /* !_STDDEF_H && !_STDDEF_H_ && !_ANSI_STDDEF_H && !__STDDEF_H__ + || __need_XXX was not defined before */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/stdfix.h b/lib/gcc/x86_64-linux-android/4.9.x/include/stdfix.h new file mode 100644 index 0000000..93e759a --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/stdfix.h @@ -0,0 +1,204 @@ +/* Copyright (C) 2007-2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +. */ + +/* ISO/IEC JTC1 SC22 WG14 N1169 + * Date: 2006-04-04 + * ISO/IEC TR 18037 + * Programming languages - C - Extensions to support embedded processors + */ + +#ifndef _STDFIX_H +#define _STDFIX_H + +/* 7.18a.1 Introduction. */ + +#undef fract +#undef accum +#undef sat +#define fract _Fract +#define accum _Accum +#define sat _Sat + +/* 7.18a.3 Precision macros. */ + +#undef SFRACT_FBIT +#undef SFRACT_MIN +#undef SFRACT_MAX +#undef SFRACT_EPSILON +#define SFRACT_FBIT __SFRACT_FBIT__ +#define SFRACT_MIN __SFRACT_MIN__ +#define SFRACT_MAX __SFRACT_MAX__ +#define SFRACT_EPSILON __SFRACT_EPSILON__ + +#undef USFRACT_FBIT +#undef USFRACT_MIN +#undef USFRACT_MAX +#undef USFRACT_EPSILON +#define USFRACT_FBIT __USFRACT_FBIT__ +#define USFRACT_MIN __USFRACT_MIN__ /* GCC extension. */ +#define USFRACT_MAX __USFRACT_MAX__ +#define USFRACT_EPSILON __USFRACT_EPSILON__ + +#undef FRACT_FBIT +#undef FRACT_MIN +#undef FRACT_MAX +#undef FRACT_EPSILON +#define FRACT_FBIT __FRACT_FBIT__ +#define FRACT_MIN __FRACT_MIN__ +#define FRACT_MAX __FRACT_MAX__ +#define FRACT_EPSILON __FRACT_EPSILON__ + +#undef UFRACT_FBIT +#undef UFRACT_MIN +#undef UFRACT_MAX +#undef UFRACT_EPSILON +#define UFRACT_FBIT __UFRACT_FBIT__ +#define UFRACT_MIN __UFRACT_MIN__ /* GCC extension. */ +#define UFRACT_MAX __UFRACT_MAX__ +#define UFRACT_EPSILON __UFRACT_EPSILON__ + +#undef LFRACT_FBIT +#undef LFRACT_MIN +#undef LFRACT_MAX +#undef LFRACT_EPSILON +#define LFRACT_FBIT __LFRACT_FBIT__ +#define LFRACT_MIN __LFRACT_MIN__ +#define LFRACT_MAX __LFRACT_MAX__ +#define LFRACT_EPSILON __LFRACT_EPSILON__ + +#undef ULFRACT_FBIT +#undef ULFRACT_MIN +#undef ULFRACT_MAX +#undef ULFRACT_EPSILON +#define ULFRACT_FBIT __ULFRACT_FBIT__ +#define ULFRACT_MIN __ULFRACT_MIN__ /* GCC extension. */ +#define ULFRACT_MAX __ULFRACT_MAX__ +#define ULFRACT_EPSILON __ULFRACT_EPSILON__ + +#undef LLFRACT_FBIT +#undef LLFRACT_MIN +#undef LLFRACT_MAX +#undef LLFRACT_EPSILON +#define LLFRACT_FBIT __LLFRACT_FBIT__ /* GCC extension. */ +#define LLFRACT_MIN __LLFRACT_MIN__ /* GCC extension. */ +#define LLFRACT_MAX __LLFRACT_MAX__ /* GCC extension. */ +#define LLFRACT_EPSILON __LLFRACT_EPSILON__ /* GCC extension. */ + +#undef ULLFRACT_FBIT +#undef ULLFRACT_MIN +#undef ULLFRACT_MAX +#undef ULLFRACT_EPSILON +#define ULLFRACT_FBIT __ULLFRACT_FBIT__ /* GCC extension. */ +#define ULLFRACT_MIN __ULLFRACT_MIN__ /* GCC extension. */ +#define ULLFRACT_MAX __ULLFRACT_MAX__ /* GCC extension. */ +#define ULLFRACT_EPSILON __ULLFRACT_EPSILON__ /* GCC extension. */ + +#undef SACCUM_FBIT +#undef SACCUM_IBIT +#undef SACCUM_MIN +#undef SACCUM_MAX +#undef SACCUM_EPSILON +#define SACCUM_FBIT __SACCUM_FBIT__ +#define SACCUM_IBIT __SACCUM_IBIT__ +#define SACCUM_MIN __SACCUM_MIN__ +#define SACCUM_MAX __SACCUM_MAX__ +#define SACCUM_EPSILON __SACCUM_EPSILON__ + +#undef USACCUM_FBIT +#undef USACCUM_IBIT +#undef USACCUM_MIN +#undef USACCUM_MAX +#undef USACCUM_EPSILON +#define USACCUM_FBIT __USACCUM_FBIT__ +#define USACCUM_IBIT __USACCUM_IBIT__ +#define USACCUM_MIN __USACCUM_MIN__ /* GCC extension. */ +#define USACCUM_MAX __USACCUM_MAX__ +#define USACCUM_EPSILON __USACCUM_EPSILON__ + +#undef ACCUM_FBIT +#undef ACCUM_IBIT +#undef ACCUM_MIN +#undef ACCUM_MAX +#undef ACCUM_EPSILON +#define ACCUM_FBIT __ACCUM_FBIT__ +#define ACCUM_IBIT __ACCUM_IBIT__ +#define ACCUM_MIN __ACCUM_MIN__ +#define ACCUM_MAX __ACCUM_MAX__ +#define ACCUM_EPSILON __ACCUM_EPSILON__ + +#undef UACCUM_FBIT +#undef UACCUM_IBIT +#undef UACCUM_MIN +#undef UACCUM_MAX +#undef UACCUM_EPSILON +#define UACCUM_FBIT __UACCUM_FBIT__ +#define UACCUM_IBIT __UACCUM_IBIT__ +#define UACCUM_MIN __UACCUM_MIN__ /* GCC extension. */ +#define UACCUM_MAX __UACCUM_MAX__ +#define UACCUM_EPSILON __UACCUM_EPSILON__ + +#undef LACCUM_FBIT +#undef LACCUM_IBIT +#undef LACCUM_MIN +#undef LACCUM_MAX +#undef LACCUM_EPSILON +#define LACCUM_FBIT __LACCUM_FBIT__ +#define LACCUM_IBIT __LACCUM_IBIT__ +#define LACCUM_MIN __LACCUM_MIN__ +#define LACCUM_MAX __LACCUM_MAX__ +#define LACCUM_EPSILON __LACCUM_EPSILON__ + +#undef ULACCUM_FBIT +#undef ULACCUM_IBIT +#undef ULACCUM_MIN +#undef ULACCUM_MAX +#undef ULACCUM_EPSILON +#define ULACCUM_FBIT __ULACCUM_FBIT__ +#define ULACCUM_IBIT __ULACCUM_IBIT__ +#define ULACCUM_MIN __ULACCUM_MIN__ /* GCC extension. */ +#define ULACCUM_MAX __ULACCUM_MAX__ +#define ULACCUM_EPSILON __ULACCUM_EPSILON__ + +#undef LLACCUM_FBIT +#undef LLACCUM_IBIT +#undef LLACCUM_MIN +#undef LLACCUM_MAX +#undef LLACCUM_EPSILON +#define LLACCUM_FBIT __LLACCUM_FBIT__ /* GCC extension. */ +#define LLACCUM_IBIT __LLACCUM_IBIT__ /* GCC extension. */ +#define LLACCUM_MIN __LLACCUM_MIN__ /* GCC extension. */ +#define LLACCUM_MAX __LLACCUM_MAX__ /* GCC extension. */ +#define LLACCUM_EPSILON __LLACCUM_EPSILON__ /* GCC extension. */ + +#undef ULLACCUM_FBIT +#undef ULLACCUM_IBIT +#undef ULLACCUM_MIN +#undef ULLACCUM_MAX +#undef ULLACCUM_EPSILON +#define ULLACCUM_FBIT __ULLACCUM_FBIT__ /* GCC extension. */ +#define ULLACCUM_IBIT __ULLACCUM_IBIT__ /* GCC extension. */ +#define ULLACCUM_MIN __ULLACCUM_MIN__ /* GCC extension. */ +#define ULLACCUM_MAX __ULLACCUM_MAX__ /* GCC extension. */ +#define ULLACCUM_EPSILON __ULLACCUM_EPSILON__ /* GCC extension. */ + +#endif /* _STDFIX_H */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/stdint-gcc.h b/lib/gcc/x86_64-linux-android/4.9.x/include/stdint-gcc.h new file mode 100644 index 0000000..1470cea --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/stdint-gcc.h @@ -0,0 +1,263 @@ +/* Copyright (C) 2008-2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +. */ + +/* + * ISO C Standard: 7.18 Integer types + */ + +#ifndef _GCC_STDINT_H +#define _GCC_STDINT_H + +/* 7.8.1.1 Exact-width integer types */ + +#ifdef __INT8_TYPE__ +typedef __INT8_TYPE__ int8_t; +#endif +#ifdef __INT16_TYPE__ +typedef __INT16_TYPE__ int16_t; +#endif +#ifdef __INT32_TYPE__ +typedef __INT32_TYPE__ int32_t; +#endif +#ifdef __INT64_TYPE__ +typedef __INT64_TYPE__ int64_t; +#endif +#ifdef __UINT8_TYPE__ +typedef __UINT8_TYPE__ uint8_t; +#endif +#ifdef __UINT16_TYPE__ +typedef __UINT16_TYPE__ uint16_t; +#endif +#ifdef __UINT32_TYPE__ +typedef __UINT32_TYPE__ uint32_t; +#endif +#ifdef __UINT64_TYPE__ +typedef __UINT64_TYPE__ uint64_t; +#endif + +/* 7.8.1.2 Minimum-width integer types */ + +typedef __INT_LEAST8_TYPE__ int_least8_t; +typedef __INT_LEAST16_TYPE__ int_least16_t; +typedef __INT_LEAST32_TYPE__ int_least32_t; +typedef __INT_LEAST64_TYPE__ int_least64_t; +typedef __UINT_LEAST8_TYPE__ uint_least8_t; +typedef __UINT_LEAST16_TYPE__ uint_least16_t; +typedef __UINT_LEAST32_TYPE__ uint_least32_t; +typedef __UINT_LEAST64_TYPE__ uint_least64_t; + +/* 7.8.1.3 Fastest minimum-width integer types */ + +typedef __INT_FAST8_TYPE__ int_fast8_t; +typedef __INT_FAST16_TYPE__ int_fast16_t; +typedef __INT_FAST32_TYPE__ int_fast32_t; +typedef __INT_FAST64_TYPE__ int_fast64_t; +typedef __UINT_FAST8_TYPE__ uint_fast8_t; +typedef __UINT_FAST16_TYPE__ uint_fast16_t; +typedef __UINT_FAST32_TYPE__ uint_fast32_t; +typedef __UINT_FAST64_TYPE__ uint_fast64_t; + +/* 7.8.1.4 Integer types capable of holding object pointers */ + +#ifdef __INTPTR_TYPE__ +typedef __INTPTR_TYPE__ intptr_t; +#endif +#ifdef __UINTPTR_TYPE__ +typedef __UINTPTR_TYPE__ uintptr_t; +#endif + +/* 7.8.1.5 Greatest-width integer types */ + +typedef __INTMAX_TYPE__ intmax_t; +typedef __UINTMAX_TYPE__ uintmax_t; + +#if (!defined __cplusplus || __cplusplus >= 201103L \ + || defined __STDC_LIMIT_MACROS) + +/* 7.18.2 Limits of specified-width integer types */ + +#ifdef __INT8_MAX__ +# undef INT8_MAX +# define INT8_MAX __INT8_MAX__ +# undef INT8_MIN +# define INT8_MIN (-INT8_MAX - 1) +#endif +#ifdef __UINT8_MAX__ +# undef UINT8_MAX +# define UINT8_MAX __UINT8_MAX__ +#endif +#ifdef __INT16_MAX__ +# undef INT16_MAX +# define INT16_MAX __INT16_MAX__ +# undef INT16_MIN +# define INT16_MIN (-INT16_MAX - 1) +#endif +#ifdef __UINT16_MAX__ +# undef UINT16_MAX +# define UINT16_MAX __UINT16_MAX__ +#endif +#ifdef __INT32_MAX__ +# undef INT32_MAX +# define INT32_MAX __INT32_MAX__ +# undef INT32_MIN +# define INT32_MIN (-INT32_MAX - 1) +#endif +#ifdef __UINT32_MAX__ +# undef UINT32_MAX +# define UINT32_MAX __UINT32_MAX__ +#endif +#ifdef __INT64_MAX__ +# undef INT64_MAX +# define INT64_MAX __INT64_MAX__ +# undef INT64_MIN +# define INT64_MIN (-INT64_MAX - 1) +#endif +#ifdef __UINT64_MAX__ +# undef UINT64_MAX +# define UINT64_MAX __UINT64_MAX__ +#endif + +#undef INT_LEAST8_MAX +#define INT_LEAST8_MAX __INT_LEAST8_MAX__ +#undef INT_LEAST8_MIN +#define INT_LEAST8_MIN (-INT_LEAST8_MAX - 1) +#undef UINT_LEAST8_MAX +#define UINT_LEAST8_MAX __UINT_LEAST8_MAX__ +#undef INT_LEAST16_MAX +#define INT_LEAST16_MAX __INT_LEAST16_MAX__ +#undef INT_LEAST16_MIN +#define INT_LEAST16_MIN (-INT_LEAST16_MAX - 1) +#undef UINT_LEAST16_MAX +#define UINT_LEAST16_MAX __UINT_LEAST16_MAX__ +#undef INT_LEAST32_MAX +#define INT_LEAST32_MAX __INT_LEAST32_MAX__ +#undef INT_LEAST32_MIN +#define INT_LEAST32_MIN (-INT_LEAST32_MAX - 1) +#undef UINT_LEAST32_MAX +#define UINT_LEAST32_MAX __UINT_LEAST32_MAX__ +#undef INT_LEAST64_MAX +#define INT_LEAST64_MAX __INT_LEAST64_MAX__ +#undef INT_LEAST64_MIN +#define INT_LEAST64_MIN (-INT_LEAST64_MAX - 1) +#undef UINT_LEAST64_MAX +#define UINT_LEAST64_MAX __UINT_LEAST64_MAX__ + +#undef INT_FAST8_MAX +#define INT_FAST8_MAX __INT_FAST8_MAX__ +#undef INT_FAST8_MIN +#define INT_FAST8_MIN (-INT_FAST8_MAX - 1) +#undef UINT_FAST8_MAX +#define UINT_FAST8_MAX __UINT_FAST8_MAX__ +#undef INT_FAST16_MAX +#define INT_FAST16_MAX __INT_FAST16_MAX__ +#undef INT_FAST16_MIN +#define INT_FAST16_MIN (-INT_FAST16_MAX - 1) +#undef UINT_FAST16_MAX +#define UINT_FAST16_MAX __UINT_FAST16_MAX__ +#undef INT_FAST32_MAX +#define INT_FAST32_MAX __INT_FAST32_MAX__ +#undef INT_FAST32_MIN +#define INT_FAST32_MIN (-INT_FAST32_MAX - 1) +#undef UINT_FAST32_MAX +#define UINT_FAST32_MAX __UINT_FAST32_MAX__ +#undef INT_FAST64_MAX +#define INT_FAST64_MAX __INT_FAST64_MAX__ +#undef INT_FAST64_MIN +#define INT_FAST64_MIN (-INT_FAST64_MAX - 1) +#undef UINT_FAST64_MAX +#define UINT_FAST64_MAX __UINT_FAST64_MAX__ + +#ifdef __INTPTR_MAX__ +# undef INTPTR_MAX +# define INTPTR_MAX __INTPTR_MAX__ +# undef INTPTR_MIN +# define INTPTR_MIN (-INTPTR_MAX - 1) +#endif +#ifdef __UINTPTR_MAX__ +# undef UINTPTR_MAX +# define UINTPTR_MAX __UINTPTR_MAX__ +#endif + +#undef INTMAX_MAX +#define INTMAX_MAX __INTMAX_MAX__ +#undef INTMAX_MIN +#define INTMAX_MIN (-INTMAX_MAX - 1) +#undef UINTMAX_MAX +#define UINTMAX_MAX __UINTMAX_MAX__ + +/* 7.18.3 Limits of other integer types */ + +#undef PTRDIFF_MAX +#define PTRDIFF_MAX __PTRDIFF_MAX__ +#undef PTRDIFF_MIN +#define PTRDIFF_MIN (-PTRDIFF_MAX - 1) + +#undef SIG_ATOMIC_MAX +#define SIG_ATOMIC_MAX __SIG_ATOMIC_MAX__ +#undef SIG_ATOMIC_MIN +#define SIG_ATOMIC_MIN __SIG_ATOMIC_MIN__ + +#undef SIZE_MAX +#define SIZE_MAX __SIZE_MAX__ + +#undef WCHAR_MAX +#define WCHAR_MAX __WCHAR_MAX__ +#undef WCHAR_MIN +#define WCHAR_MIN __WCHAR_MIN__ + +#undef WINT_MAX +#define WINT_MAX __WINT_MAX__ +#undef WINT_MIN +#define WINT_MIN __WINT_MIN__ + +#endif /* (!defined __cplusplus || __cplusplus >= 201103L + || defined __STDC_LIMIT_MACROS) */ + +#if (!defined __cplusplus || __cplusplus >= 201103L \ + || defined __STDC_CONSTANT_MACROS) + +#undef INT8_C +#define INT8_C(c) __INT8_C(c) +#undef INT16_C +#define INT16_C(c) __INT16_C(c) +#undef INT32_C +#define INT32_C(c) __INT32_C(c) +#undef INT64_C +#define INT64_C(c) __INT64_C(c) +#undef UINT8_C +#define UINT8_C(c) __UINT8_C(c) +#undef UINT16_C +#define UINT16_C(c) __UINT16_C(c) +#undef UINT32_C +#define UINT32_C(c) __UINT32_C(c) +#undef UINT64_C +#define UINT64_C(c) __UINT64_C(c) +#undef INTMAX_C +#define INTMAX_C(c) __INTMAX_C(c) +#undef UINTMAX_C +#define UINTMAX_C(c) __UINTMAX_C(c) + +#endif /* (!defined __cplusplus || __cplusplus >= 201103L + || defined __STDC_CONSTANT_MACROS) */ + +#endif /* _GCC_STDINT_H */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/stdint.h b/lib/gcc/x86_64-linux-android/4.9.x/include/stdint.h new file mode 100644 index 0000000..83b6f70 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/stdint.h @@ -0,0 +1,14 @@ +#ifndef _GCC_WRAP_STDINT_H +#if __STDC_HOSTED__ +# if defined __cplusplus && __cplusplus >= 201103L +# undef __STDC_LIMIT_MACROS +# define __STDC_LIMIT_MACROS +# undef __STDC_CONSTANT_MACROS +# define __STDC_CONSTANT_MACROS +# endif +# include_next +#else +# include "stdint-gcc.h" +#endif +#define _GCC_WRAP_STDINT_H +#endif diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/stdnoreturn.h b/lib/gcc/x86_64-linux-android/4.9.x/include/stdnoreturn.h new file mode 100644 index 0000000..0134137 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/stdnoreturn.h @@ -0,0 +1,35 @@ +/* Copyright (C) 2011-2014 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +. */ + +/* ISO C1X: 7.23 _Noreturn . */ + +#ifndef _STDNORETURN_H +#define _STDNORETURN_H + +#ifndef __cplusplus + +#define noreturn _Noreturn + +#endif + +#endif /* stdnoreturn.h */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/tbmintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/tbmintrin.h new file mode 100644 index 0000000..871f532 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/tbmintrin.h @@ -0,0 +1,180 @@ +/* Copyright (C) 2010-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _X86INTRIN_H_INCLUDED +# error "Never use directly; include instead." +#endif + +#ifndef _TBMINTRIN_H_INCLUDED +#define _TBMINTRIN_H_INCLUDED + +#ifndef __TBM__ +#pragma GCC push_options +#pragma GCC target("tbm") +#define __DISABLE_TBM__ +#endif /* __TBM__ */ + +#ifdef __OPTIMIZE__ +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__bextri_u32 (unsigned int __X, const unsigned int __I) +{ + return __builtin_ia32_bextri_u32 (__X, __I); +} +#else +#define __bextri_u32(X, I) \ + ((unsigned int)__builtin_ia32_bextri_u32 ((unsigned int)(X), \ + (unsigned int)(I))) +#endif /*__OPTIMIZE__ */ + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__blcfill_u32 (unsigned int __X) +{ + return __X & (__X + 1); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__blci_u32 (unsigned int __X) +{ + return __X | ~(__X + 1); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__blcic_u32 (unsigned int __X) +{ + return ~__X & (__X + 1); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__blcmsk_u32 (unsigned int __X) +{ + return __X ^ (__X + 1); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__blcs_u32 (unsigned int __X) +{ + return __X | (__X + 1); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__blsfill_u32 (unsigned int __X) +{ + return __X | (__X - 1); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__blsic_u32 (unsigned int __X) +{ + return ~__X | (__X - 1); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__t1mskc_u32 (unsigned int __X) +{ + return ~__X | (__X + 1); +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__tzmsk_u32 (unsigned int __X) +{ + return ~__X & (__X - 1); +} + + + +#ifdef __x86_64__ +#ifdef __OPTIMIZE__ +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__bextri_u64 (unsigned long long __X, const unsigned int __I) +{ + return __builtin_ia32_bextri_u64 (__X, __I); +} +#else +#define __bextri_u64(X, I) \ + ((unsigned long long)__builtin_ia32_bextri_u64 ((unsigned long long)(X), \ + (unsigned long long)(I))) +#endif /*__OPTIMIZE__ */ + +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__blcfill_u64 (unsigned long long __X) +{ + return __X & (__X + 1); +} + +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__blci_u64 (unsigned long long __X) +{ + return __X | ~(__X + 1); +} + +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__blcic_u64 (unsigned long long __X) +{ + return ~__X & (__X + 1); +} + +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__blcmsk_u64 (unsigned long long __X) +{ + return __X ^ (__X + 1); +} + +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__blcs_u64 (unsigned long long __X) +{ + return __X | (__X + 1); +} + +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__blsfill_u64 (unsigned long long __X) +{ + return __X | (__X - 1); +} + +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__blsic_u64 (unsigned long long __X) +{ + return ~__X | (__X - 1); +} + +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__t1mskc_u64 (unsigned long long __X) +{ + return ~__X | (__X + 1); +} + +extern __inline unsigned long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +__tzmsk_u64 (unsigned long long __X) +{ + return ~__X & (__X - 1); +} + + +#endif /* __x86_64__ */ + +#ifdef __DISABLE_TBM__ +#undef __DISABLE_TBM__ +#pragma GCC pop_options +#endif /* __DISABLE_TBM__ */ + +#endif /* _TBMINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/tmmintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/tmmintrin.h new file mode 100644 index 0000000..89556d2 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/tmmintrin.h @@ -0,0 +1,249 @@ +/* Copyright (C) 2006-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* Implemented from the specification included in the Intel C++ Compiler + User Guide and Reference, version 9.1. */ + +#ifndef _TMMINTRIN_H_INCLUDED +#define _TMMINTRIN_H_INCLUDED + +/* We need definitions from the SSE3, SSE2 and SSE header files*/ +#include + +#ifndef __SSSE3__ +#pragma GCC push_options +#pragma GCC target("ssse3") +#define __DISABLE_SSSE3__ +#endif /* __SSSE3__ */ + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hadd_epi16 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_phaddw128 ((__v8hi)__X, (__v8hi)__Y); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hadd_epi32 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_phaddd128 ((__v4si)__X, (__v4si)__Y); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hadds_epi16 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_phaddsw128 ((__v8hi)__X, (__v8hi)__Y); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hadd_pi16 (__m64 __X, __m64 __Y) +{ + return (__m64) __builtin_ia32_phaddw ((__v4hi)__X, (__v4hi)__Y); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hadd_pi32 (__m64 __X, __m64 __Y) +{ + return (__m64) __builtin_ia32_phaddd ((__v2si)__X, (__v2si)__Y); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hadds_pi16 (__m64 __X, __m64 __Y) +{ + return (__m64) __builtin_ia32_phaddsw ((__v4hi)__X, (__v4hi)__Y); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hsub_epi16 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_phsubw128 ((__v8hi)__X, (__v8hi)__Y); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hsub_epi32 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_phsubd128 ((__v4si)__X, (__v4si)__Y); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hsubs_epi16 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_phsubsw128 ((__v8hi)__X, (__v8hi)__Y); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hsub_pi16 (__m64 __X, __m64 __Y) +{ + return (__m64) __builtin_ia32_phsubw ((__v4hi)__X, (__v4hi)__Y); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hsub_pi32 (__m64 __X, __m64 __Y) +{ + return (__m64) __builtin_ia32_phsubd ((__v2si)__X, (__v2si)__Y); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hsubs_pi16 (__m64 __X, __m64 __Y) +{ + return (__m64) __builtin_ia32_phsubsw ((__v4hi)__X, (__v4hi)__Y); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maddubs_epi16 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_pmaddubsw128 ((__v16qi)__X, (__v16qi)__Y); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maddubs_pi16 (__m64 __X, __m64 __Y) +{ + return (__m64) __builtin_ia32_pmaddubsw ((__v8qi)__X, (__v8qi)__Y); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mulhrs_epi16 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_pmulhrsw128 ((__v8hi)__X, (__v8hi)__Y); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mulhrs_pi16 (__m64 __X, __m64 __Y) +{ + return (__m64) __builtin_ia32_pmulhrsw ((__v4hi)__X, (__v4hi)__Y); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_shuffle_epi8 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_pshufb128 ((__v16qi)__X, (__v16qi)__Y); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_shuffle_pi8 (__m64 __X, __m64 __Y) +{ + return (__m64) __builtin_ia32_pshufb ((__v8qi)__X, (__v8qi)__Y); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sign_epi8 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_psignb128 ((__v16qi)__X, (__v16qi)__Y); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sign_epi16 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_psignw128 ((__v8hi)__X, (__v8hi)__Y); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sign_epi32 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_psignd128 ((__v4si)__X, (__v4si)__Y); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sign_pi8 (__m64 __X, __m64 __Y) +{ + return (__m64) __builtin_ia32_psignb ((__v8qi)__X, (__v8qi)__Y); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sign_pi16 (__m64 __X, __m64 __Y) +{ + return (__m64) __builtin_ia32_psignw ((__v4hi)__X, (__v4hi)__Y); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sign_pi32 (__m64 __X, __m64 __Y) +{ + return (__m64) __builtin_ia32_psignd ((__v2si)__X, (__v2si)__Y); +} + +#ifdef __OPTIMIZE__ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_alignr_epi8(__m128i __X, __m128i __Y, const int __N) +{ + return (__m128i) __builtin_ia32_palignr128 ((__v2di)__X, + (__v2di)__Y, __N * 8); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_alignr_pi8(__m64 __X, __m64 __Y, const int __N) +{ + return (__m64) __builtin_ia32_palignr ((__v1di)__X, + (__v1di)__Y, __N * 8); +} +#else +#define _mm_alignr_epi8(X, Y, N) \ + ((__m128i) __builtin_ia32_palignr128 ((__v2di)(__m128i)(X), \ + (__v2di)(__m128i)(Y), \ + (int)(N) * 8)) +#define _mm_alignr_pi8(X, Y, N) \ + ((__m64) __builtin_ia32_palignr ((__v1di)(__m64)(X), \ + (__v1di)(__m64)(Y), \ + (int)(N) * 8)) +#endif + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_abs_epi8 (__m128i __X) +{ + return (__m128i) __builtin_ia32_pabsb128 ((__v16qi)__X); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_abs_epi16 (__m128i __X) +{ + return (__m128i) __builtin_ia32_pabsw128 ((__v8hi)__X); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_abs_epi32 (__m128i __X) +{ + return (__m128i) __builtin_ia32_pabsd128 ((__v4si)__X); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_abs_pi8 (__m64 __X) +{ + return (__m64) __builtin_ia32_pabsb ((__v8qi)__X); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_abs_pi16 (__m64 __X) +{ + return (__m64) __builtin_ia32_pabsw ((__v4hi)__X); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_abs_pi32 (__m64 __X) +{ + return (__m64) __builtin_ia32_pabsd ((__v2si)__X); +} + +#ifdef __DISABLE_SSSE3__ +#undef __DISABLE_SSSE3__ +#pragma GCC pop_options +#endif /* __DISABLE_SSSE3__ */ + +#endif /* _TMMINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/unwind.h b/lib/gcc/x86_64-linux-android/4.9.x/include/unwind.h new file mode 100644 index 0000000..d351fb9 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/unwind.h @@ -0,0 +1,293 @@ +/* Exception handling and frame unwind runtime interface routines. + Copyright (C) 2001-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but WITHOUT + ANY WARRANTY; without even the implied warranty of MERCHANTABILITY + or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public + License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* This is derived from the C++ ABI for IA-64. Where we diverge + for cross-architecture compatibility are noted with "@@@". */ + +#ifndef _UNWIND_H +#define _UNWIND_H + +#if defined (__SEH__) && !defined (__USING_SJLJ_EXCEPTIONS__) +/* Only for _GCC_specific_handler. */ +#include +#endif + +#ifndef HIDE_EXPORTS +#pragma GCC visibility push(default) +#endif + +#ifdef __cplusplus +extern "C" { +#endif + +/* Level 1: Base ABI */ + +/* @@@ The IA-64 ABI uses uint64 throughout. Most places this is + inefficient for 32-bit and smaller machines. */ +typedef unsigned _Unwind_Word __attribute__((__mode__(__unwind_word__))); +typedef signed _Unwind_Sword __attribute__((__mode__(__unwind_word__))); +#if defined(__ia64__) && defined(__hpux__) +typedef unsigned _Unwind_Ptr __attribute__((__mode__(__word__))); +#else +typedef unsigned _Unwind_Ptr __attribute__((__mode__(__pointer__))); +#endif +typedef unsigned _Unwind_Internal_Ptr __attribute__((__mode__(__pointer__))); + +/* @@@ The IA-64 ABI uses a 64-bit word to identify the producer and + consumer of an exception. We'll go along with this for now even on + 32-bit machines. We'll need to provide some other option for + 16-bit machines and for machines with > 8 bits per byte. */ +typedef unsigned _Unwind_Exception_Class __attribute__((__mode__(__DI__))); + +/* The unwind interface uses reason codes in several contexts to + identify the reasons for failures or other actions. */ +typedef enum +{ + _URC_NO_REASON = 0, + _URC_FOREIGN_EXCEPTION_CAUGHT = 1, + _URC_FATAL_PHASE2_ERROR = 2, + _URC_FATAL_PHASE1_ERROR = 3, + _URC_NORMAL_STOP = 4, + _URC_END_OF_STACK = 5, + _URC_HANDLER_FOUND = 6, + _URC_INSTALL_CONTEXT = 7, + _URC_CONTINUE_UNWIND = 8 +} _Unwind_Reason_Code; + + +/* The unwind interface uses a pointer to an exception header object + as its representation of an exception being thrown. In general, the + full representation of an exception object is language- and + implementation-specific, but it will be prefixed by a header + understood by the unwind interface. */ + +struct _Unwind_Exception; + +typedef void (*_Unwind_Exception_Cleanup_Fn) (_Unwind_Reason_Code, + struct _Unwind_Exception *); + +struct _Unwind_Exception +{ + _Unwind_Exception_Class exception_class; + _Unwind_Exception_Cleanup_Fn exception_cleanup; + +#if !defined (__USING_SJLJ_EXCEPTIONS__) && defined (__SEH__) + _Unwind_Word private_[6]; +#else + _Unwind_Word private_1; + _Unwind_Word private_2; +#endif + + /* @@@ The IA-64 ABI says that this structure must be double-word aligned. + Taking that literally does not make much sense generically. Instead we + provide the maximum alignment required by any type for the machine. */ +} __attribute__((__aligned__)); + + +/* The ACTIONS argument to the personality routine is a bitwise OR of one + or more of the following constants. */ +typedef int _Unwind_Action; + +#define _UA_SEARCH_PHASE 1 +#define _UA_CLEANUP_PHASE 2 +#define _UA_HANDLER_FRAME 4 +#define _UA_FORCE_UNWIND 8 +#define _UA_END_OF_STACK 16 + +/* The target can override this macro to define any back-end-specific + attributes required for the lowest-level stack frame. */ +#ifndef LIBGCC2_UNWIND_ATTRIBUTE +#define LIBGCC2_UNWIND_ATTRIBUTE +#endif + +/* This is an opaque type used to refer to a system-specific data + structure used by the system unwinder. This context is created and + destroyed by the system, and passed to the personality routine + during unwinding. */ +struct _Unwind_Context; + +/* Raise an exception, passing along the given exception object. */ +extern _Unwind_Reason_Code LIBGCC2_UNWIND_ATTRIBUTE +_Unwind_RaiseException (struct _Unwind_Exception *); + +/* Raise an exception for forced unwinding. */ + +typedef _Unwind_Reason_Code (*_Unwind_Stop_Fn) + (int, _Unwind_Action, _Unwind_Exception_Class, + struct _Unwind_Exception *, struct _Unwind_Context *, void *); + +extern _Unwind_Reason_Code LIBGCC2_UNWIND_ATTRIBUTE +_Unwind_ForcedUnwind (struct _Unwind_Exception *, _Unwind_Stop_Fn, void *); + +/* Helper to invoke the exception_cleanup routine. */ +extern void _Unwind_DeleteException (struct _Unwind_Exception *); + +/* Resume propagation of an existing exception. This is used after + e.g. executing cleanup code, and not to implement rethrowing. */ +extern void LIBGCC2_UNWIND_ATTRIBUTE +_Unwind_Resume (struct _Unwind_Exception *); + +/* @@@ Resume propagation of a FORCE_UNWIND exception, or to rethrow + a normal exception that was handled. */ +extern _Unwind_Reason_Code LIBGCC2_UNWIND_ATTRIBUTE +_Unwind_Resume_or_Rethrow (struct _Unwind_Exception *); + +/* @@@ Use unwind data to perform a stack backtrace. The trace callback + is called for every stack frame in the call chain, but no cleanup + actions are performed. */ +typedef _Unwind_Reason_Code (*_Unwind_Trace_Fn) + (struct _Unwind_Context *, void *); + +extern _Unwind_Reason_Code LIBGCC2_UNWIND_ATTRIBUTE +_Unwind_Backtrace (_Unwind_Trace_Fn, void *); + +/* These functions are used for communicating information about the unwind + context (i.e. the unwind descriptors and the user register state) between + the unwind library and the personality routine and landing pad. Only + selected registers may be manipulated. */ + +extern _Unwind_Word _Unwind_GetGR (struct _Unwind_Context *, int); +extern void _Unwind_SetGR (struct _Unwind_Context *, int, _Unwind_Word); + +extern _Unwind_Ptr _Unwind_GetIP (struct _Unwind_Context *); +extern _Unwind_Ptr _Unwind_GetIPInfo (struct _Unwind_Context *, int *); +extern void _Unwind_SetIP (struct _Unwind_Context *, _Unwind_Ptr); + +/* @@@ Retrieve the CFA of the given context. */ +extern _Unwind_Word _Unwind_GetCFA (struct _Unwind_Context *); + +extern void *_Unwind_GetLanguageSpecificData (struct _Unwind_Context *); + +extern _Unwind_Ptr _Unwind_GetRegionStart (struct _Unwind_Context *); + + +/* The personality routine is the function in the C++ (or other language) + runtime library which serves as an interface between the system unwind + library and language-specific exception handling semantics. It is + specific to the code fragment described by an unwind info block, and + it is always referenced via the pointer in the unwind info block, and + hence it has no ABI-specified name. + + Note that this implies that two different C++ implementations can + use different names, and have different contents in the language + specific data area. Moreover, that the language specific data + area contains no version info because name of the function invoked + provides more effective versioning by detecting at link time the + lack of code to handle the different data format. */ + +typedef _Unwind_Reason_Code (*_Unwind_Personality_Fn) + (int, _Unwind_Action, _Unwind_Exception_Class, + struct _Unwind_Exception *, struct _Unwind_Context *); + +/* @@@ The following alternate entry points are for setjmp/longjmp + based unwinding. */ + +struct SjLj_Function_Context; +extern void _Unwind_SjLj_Register (struct SjLj_Function_Context *); +extern void _Unwind_SjLj_Unregister (struct SjLj_Function_Context *); + +extern _Unwind_Reason_Code LIBGCC2_UNWIND_ATTRIBUTE +_Unwind_SjLj_RaiseException (struct _Unwind_Exception *); +extern _Unwind_Reason_Code LIBGCC2_UNWIND_ATTRIBUTE +_Unwind_SjLj_ForcedUnwind (struct _Unwind_Exception *, _Unwind_Stop_Fn, void *); +extern void LIBGCC2_UNWIND_ATTRIBUTE +_Unwind_SjLj_Resume (struct _Unwind_Exception *); +extern _Unwind_Reason_Code LIBGCC2_UNWIND_ATTRIBUTE +_Unwind_SjLj_Resume_or_Rethrow (struct _Unwind_Exception *); + +/* @@@ The following provide access to the base addresses for text + and data-relative addressing in the LDSA. In order to stay link + compatible with the standard ABI for IA-64, we inline these. */ + +#ifdef __ia64__ +#include + +static inline _Unwind_Ptr +_Unwind_GetDataRelBase (struct _Unwind_Context *_C) +{ + /* The GP is stored in R1. */ + return _Unwind_GetGR (_C, 1); +} + +static inline _Unwind_Ptr +_Unwind_GetTextRelBase (struct _Unwind_Context *_C __attribute__ ((__unused__))) +{ + abort (); + return 0; +} + +/* @@@ Retrieve the Backing Store Pointer of the given context. */ +extern _Unwind_Word _Unwind_GetBSP (struct _Unwind_Context *); +#else +extern _Unwind_Ptr _Unwind_GetDataRelBase (struct _Unwind_Context *); +extern _Unwind_Ptr _Unwind_GetTextRelBase (struct _Unwind_Context *); +#endif + +/* @@@ Given an address, return the entry point of the function that + contains it. */ +extern void * _Unwind_FindEnclosingFunction (void *pc); + +#ifndef __SIZEOF_LONG__ + #error "__SIZEOF_LONG__ macro not defined" +#endif + +#ifndef __SIZEOF_POINTER__ + #error "__SIZEOF_POINTER__ macro not defined" +#endif + + +/* leb128 type numbers have a potentially unlimited size. + The target of the following definitions of _sleb128_t and _uleb128_t + is to have efficient data types large enough to hold the leb128 type + numbers used in the unwind code. + Mostly these types will simply be defined to long and unsigned long + except when a unsigned long data type on the target machine is not + capable of storing a pointer. */ + +#if __SIZEOF_LONG__ >= __SIZEOF_POINTER__ + typedef long _sleb128_t; + typedef unsigned long _uleb128_t; +#elif __SIZEOF_LONG_LONG__ >= __SIZEOF_POINTER__ + typedef long long _sleb128_t; + typedef unsigned long long _uleb128_t; +#else +# error "What type shall we use for _sleb128_t?" +#endif + +#if defined (__SEH__) && !defined (__USING_SJLJ_EXCEPTIONS__) +/* Handles the mapping from SEH to GCC interfaces. */ +EXCEPTION_DISPOSITION _GCC_specific_handler (PEXCEPTION_RECORD, void *, + PCONTEXT, PDISPATCHER_CONTEXT, + _Unwind_Personality_Fn); +#endif + +#ifdef __cplusplus +} +#endif + +#ifndef HIDE_EXPORTS +#pragma GCC visibility pop +#endif + +#endif /* unwind.h */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/varargs.h b/lib/gcc/x86_64-linux-android/4.9.x/include/varargs.h new file mode 100644 index 0000000..4b9803e --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/varargs.h @@ -0,0 +1,7 @@ +#ifndef _VARARGS_H +#define _VARARGS_H + +#error "GCC no longer implements ." +#error "Revise your code to use ." + +#endif diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/wmmintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/wmmintrin.h new file mode 100644 index 0000000..2002375 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/wmmintrin.h @@ -0,0 +1,132 @@ +/* Copyright (C) 2008-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* Implemented from the specification included in the Intel C++ Compiler + User Guide and Reference, version 10.1. */ + +#ifndef _WMMINTRIN_H_INCLUDED +#define _WMMINTRIN_H_INCLUDED + +/* We need definitions from the SSE2 header file. */ +#include + +/* AES */ + +#ifndef __AES__ +#pragma GCC push_options +#pragma GCC target("aes") +#define __DISABLE_AES__ +#endif /* __AES__ */ + +/* Performs 1 round of AES decryption of the first m128i using + the second m128i as a round key. */ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_aesdec_si128 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_aesdec128 ((__v2di)__X, (__v2di)__Y); +} + +/* Performs the last round of AES decryption of the first m128i + using the second m128i as a round key. */ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_aesdeclast_si128 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_aesdeclast128 ((__v2di)__X, + (__v2di)__Y); +} + +/* Performs 1 round of AES encryption of the first m128i using + the second m128i as a round key. */ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_aesenc_si128 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_aesenc128 ((__v2di)__X, (__v2di)__Y); +} + +/* Performs the last round of AES encryption of the first m128i + using the second m128i as a round key. */ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_aesenclast_si128 (__m128i __X, __m128i __Y) +{ + return (__m128i) __builtin_ia32_aesenclast128 ((__v2di)__X, (__v2di)__Y); +} + +/* Performs the InverseMixColumn operation on the source m128i + and stores the result into m128i destination. */ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_aesimc_si128 (__m128i __X) +{ + return (__m128i) __builtin_ia32_aesimc128 ((__v2di)__X); +} + +/* Generates a m128i round key for the input m128i AES cipher key and + byte round constant. The second parameter must be a compile time + constant. */ +#ifdef __OPTIMIZE__ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_aeskeygenassist_si128 (__m128i __X, const int __C) +{ + return (__m128i) __builtin_ia32_aeskeygenassist128 ((__v2di)__X, __C); +} +#else +#define _mm_aeskeygenassist_si128(X, C) \ + ((__m128i) __builtin_ia32_aeskeygenassist128 ((__v2di)(__m128i)(X), \ + (int)(C))) +#endif + +#ifdef __DISABLE_AES__ +#undef __DISABLE_AES__ +#pragma GCC pop_options +#endif /* __DISABLE_AES__ */ + +/* PCLMUL */ + +#ifndef __PCLMUL__ +#pragma GCC push_options +#pragma GCC target("pclmul") +#define __DISABLE_PCLMUL__ +#endif /* __PCLMUL__ */ + +/* Performs carry-less integer multiplication of 64-bit halves of + 128-bit input operands. The third parameter inducates which 64-bit + haves of the input parameters v1 and v2 should be used. It must be + a compile time constant. */ +#ifdef __OPTIMIZE__ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_clmulepi64_si128 (__m128i __X, __m128i __Y, const int __I) +{ + return (__m128i) __builtin_ia32_pclmulqdq128 ((__v2di)__X, + (__v2di)__Y, __I); +} +#else +#define _mm_clmulepi64_si128(X, Y, I) \ + ((__m128i) __builtin_ia32_pclmulqdq128 ((__v2di)(__m128i)(X), \ + (__v2di)(__m128i)(Y), (int)(I))) +#endif + +#ifdef __DISABLE_PCLMUL__ +#undef __DISABLE_PCLMUL__ +#pragma GCC pop_options +#endif /* __DISABLE_PCLMUL__ */ + +#endif /* _WMMINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/x86intrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/x86intrin.h new file mode 100644 index 0000000..80e9e6f --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/x86intrin.h @@ -0,0 +1,78 @@ +/* Copyright (C) 2008-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _X86INTRIN_H_INCLUDED +#define _X86INTRIN_H_INCLUDED + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +/* For including AVX instructions */ +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#include + +#endif /* _X86INTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/xmmintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/xmmintrin.h new file mode 100644 index 0000000..a3824e7 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/xmmintrin.h @@ -0,0 +1,1265 @@ +/* Copyright (C) 2002-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* Implemented from the specification included in the Intel C++ Compiler + User Guide and Reference, version 9.0. */ + +#ifndef _XMMINTRIN_H_INCLUDED +#define _XMMINTRIN_H_INCLUDED + +/* We need type definitions from the MMX header file. */ +#include + +/* Get _mm_malloc () and _mm_free (). */ +#include + +/* Constants for use with _mm_prefetch. */ +enum _mm_hint +{ + /* _MM_HINT_ET is _MM_HINT_T with set 3rd bit. */ + _MM_HINT_ET0 = 7, + _MM_HINT_ET1 = 6, + _MM_HINT_T0 = 3, + _MM_HINT_T1 = 2, + _MM_HINT_T2 = 1, + _MM_HINT_NTA = 0 +}; + +/* Loads one cache line from address P to a location "closer" to the + processor. The selector I specifies the type of prefetch operation. */ +#ifdef __OPTIMIZE__ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_prefetch (const void *__P, enum _mm_hint __I) +{ + __builtin_prefetch (__P, (__I & 0x4) >> 2, __I & 0x3); +} +#else +#define _mm_prefetch(P, I) \ + __builtin_prefetch ((P), ((I & 0x4) >> 2), (I & 0x3)) +#endif + +#ifndef __SSE__ +#pragma GCC push_options +#pragma GCC target("sse") +#define __DISABLE_SSE__ +#endif /* __SSE__ */ + +/* The Intel API is flexible enough that we must allow aliasing with other + vector types, and their scalar components. */ +typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__)); + +/* Internal data types for implementing the intrinsics. */ +typedef float __v4sf __attribute__ ((__vector_size__ (16))); + +/* Create a selector for use with the SHUFPS instruction. */ +#define _MM_SHUFFLE(fp3,fp2,fp1,fp0) \ + (((fp3) << 6) | ((fp2) << 4) | ((fp1) << 2) | (fp0)) + +/* Bits in the MXCSR. */ +#define _MM_EXCEPT_MASK 0x003f +#define _MM_EXCEPT_INVALID 0x0001 +#define _MM_EXCEPT_DENORM 0x0002 +#define _MM_EXCEPT_DIV_ZERO 0x0004 +#define _MM_EXCEPT_OVERFLOW 0x0008 +#define _MM_EXCEPT_UNDERFLOW 0x0010 +#define _MM_EXCEPT_INEXACT 0x0020 + +#define _MM_MASK_MASK 0x1f80 +#define _MM_MASK_INVALID 0x0080 +#define _MM_MASK_DENORM 0x0100 +#define _MM_MASK_DIV_ZERO 0x0200 +#define _MM_MASK_OVERFLOW 0x0400 +#define _MM_MASK_UNDERFLOW 0x0800 +#define _MM_MASK_INEXACT 0x1000 + +#define _MM_ROUND_MASK 0x6000 +#define _MM_ROUND_NEAREST 0x0000 +#define _MM_ROUND_DOWN 0x2000 +#define _MM_ROUND_UP 0x4000 +#define _MM_ROUND_TOWARD_ZERO 0x6000 + +#define _MM_FLUSH_ZERO_MASK 0x8000 +#define _MM_FLUSH_ZERO_ON 0x8000 +#define _MM_FLUSH_ZERO_OFF 0x0000 + +/* Create an undefined vector. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_undefined_ps (void) +{ + __m128 __Y = __Y; + return __Y; +} + +/* Create a vector of zeros. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_setzero_ps (void) +{ + return __extension__ (__m128){ 0.0f, 0.0f, 0.0f, 0.0f }; +} + +/* Perform the respective operation on the lower SPFP (single-precision + floating-point) values of A and B; the upper three SPFP values are + passed through from A. */ + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_add_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_addss ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sub_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_subss ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mul_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_mulss ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_div_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_divss ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sqrt_ss (__m128 __A) +{ + return (__m128) __builtin_ia32_sqrtss ((__v4sf)__A); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rcp_ss (__m128 __A) +{ + return (__m128) __builtin_ia32_rcpss ((__v4sf)__A); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rsqrt_ss (__m128 __A) +{ + return (__m128) __builtin_ia32_rsqrtss ((__v4sf)__A); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_minss ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_maxss ((__v4sf)__A, (__v4sf)__B); +} + +/* Perform the respective operation on the four SPFP values in A and B. */ + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_add_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_addps ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sub_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_subps ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mul_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_mulps ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_div_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_divps ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sqrt_ps (__m128 __A) +{ + return (__m128) __builtin_ia32_sqrtps ((__v4sf)__A); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rcp_ps (__m128 __A) +{ + return (__m128) __builtin_ia32_rcpps ((__v4sf)__A); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rsqrt_ps (__m128 __A) +{ + return (__m128) __builtin_ia32_rsqrtps ((__v4sf)__A); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_minps ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_maxps ((__v4sf)__A, (__v4sf)__B); +} + +/* Perform logical bit-wise operations on 128-bit values. */ + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_and_ps (__m128 __A, __m128 __B) +{ + return __builtin_ia32_andps (__A, __B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_andnot_ps (__m128 __A, __m128 __B) +{ + return __builtin_ia32_andnps (__A, __B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_or_ps (__m128 __A, __m128 __B) +{ + return __builtin_ia32_orps (__A, __B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_xor_ps (__m128 __A, __m128 __B) +{ + return __builtin_ia32_xorps (__A, __B); +} + +/* Perform a comparison on the lower SPFP values of A and B. If the + comparison is true, place a mask of all ones in the result, otherwise a + mask of zeros. The upper three SPFP values are passed through from A. */ + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpeq_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_cmpeqss ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmplt_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_cmpltss ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmple_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_cmpless ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpgt_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_movss ((__v4sf) __A, + (__v4sf) + __builtin_ia32_cmpltss ((__v4sf) __B, + (__v4sf) + __A)); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpge_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_movss ((__v4sf) __A, + (__v4sf) + __builtin_ia32_cmpless ((__v4sf) __B, + (__v4sf) + __A)); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpneq_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_cmpneqss ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpnlt_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_cmpnltss ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpnle_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_cmpnless ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpngt_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_movss ((__v4sf) __A, + (__v4sf) + __builtin_ia32_cmpnltss ((__v4sf) __B, + (__v4sf) + __A)); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpnge_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_movss ((__v4sf) __A, + (__v4sf) + __builtin_ia32_cmpnless ((__v4sf) __B, + (__v4sf) + __A)); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpord_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_cmpordss ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpunord_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_cmpunordss ((__v4sf)__A, (__v4sf)__B); +} + +/* Perform a comparison on the four SPFP values of A and B. For each + element, if the comparison is true, place a mask of all ones in the + result, otherwise a mask of zeros. */ + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpeq_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_cmpeqps ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmplt_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_cmpltps ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmple_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_cmpleps ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpgt_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_cmpgtps ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpge_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_cmpgeps ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpneq_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_cmpneqps ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpnlt_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_cmpnltps ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpnle_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_cmpnleps ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpngt_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_cmpngtps ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpnge_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_cmpngeps ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpord_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_cmpordps ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmpunord_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_cmpunordps ((__v4sf)__A, (__v4sf)__B); +} + +/* Compare the lower SPFP values of A and B and return 1 if true + and 0 if false. */ + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comieq_ss (__m128 __A, __m128 __B) +{ + return __builtin_ia32_comieq ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comilt_ss (__m128 __A, __m128 __B) +{ + return __builtin_ia32_comilt ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comile_ss (__m128 __A, __m128 __B) +{ + return __builtin_ia32_comile ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comigt_ss (__m128 __A, __m128 __B) +{ + return __builtin_ia32_comigt ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comige_ss (__m128 __A, __m128 __B) +{ + return __builtin_ia32_comige ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comineq_ss (__m128 __A, __m128 __B) +{ + return __builtin_ia32_comineq ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ucomieq_ss (__m128 __A, __m128 __B) +{ + return __builtin_ia32_ucomieq ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ucomilt_ss (__m128 __A, __m128 __B) +{ + return __builtin_ia32_ucomilt ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ucomile_ss (__m128 __A, __m128 __B) +{ + return __builtin_ia32_ucomile ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ucomigt_ss (__m128 __A, __m128 __B) +{ + return __builtin_ia32_ucomigt ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ucomige_ss (__m128 __A, __m128 __B) +{ + return __builtin_ia32_ucomige ((__v4sf)__A, (__v4sf)__B); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_ucomineq_ss (__m128 __A, __m128 __B) +{ + return __builtin_ia32_ucomineq ((__v4sf)__A, (__v4sf)__B); +} + +/* Convert the lower SPFP value to a 32-bit integer according to the current + rounding mode. */ +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtss_si32 (__m128 __A) +{ + return __builtin_ia32_cvtss2si ((__v4sf) __A); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_ss2si (__m128 __A) +{ + return _mm_cvtss_si32 (__A); +} + +#ifdef __x86_64__ +/* Convert the lower SPFP value to a 32-bit integer according to the + current rounding mode. */ + +/* Intel intrinsic. */ +extern __inline long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtss_si64 (__m128 __A) +{ + return __builtin_ia32_cvtss2si64 ((__v4sf) __A); +} + +/* Microsoft intrinsic. */ +extern __inline long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtss_si64x (__m128 __A) +{ + return __builtin_ia32_cvtss2si64 ((__v4sf) __A); +} +#endif + +/* Convert the two lower SPFP values to 32-bit integers according to the + current rounding mode. Return the integers in packed form. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtps_pi32 (__m128 __A) +{ + return (__m64) __builtin_ia32_cvtps2pi ((__v4sf) __A); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_ps2pi (__m128 __A) +{ + return _mm_cvtps_pi32 (__A); +} + +/* Truncate the lower SPFP value to a 32-bit integer. */ +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttss_si32 (__m128 __A) +{ + return __builtin_ia32_cvttss2si ((__v4sf) __A); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtt_ss2si (__m128 __A) +{ + return _mm_cvttss_si32 (__A); +} + +#ifdef __x86_64__ +/* Truncate the lower SPFP value to a 32-bit integer. */ + +/* Intel intrinsic. */ +extern __inline long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttss_si64 (__m128 __A) +{ + return __builtin_ia32_cvttss2si64 ((__v4sf) __A); +} + +/* Microsoft intrinsic. */ +extern __inline long long __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttss_si64x (__m128 __A) +{ + return __builtin_ia32_cvttss2si64 ((__v4sf) __A); +} +#endif + +/* Truncate the two lower SPFP values to 32-bit integers. Return the + integers in packed form. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvttps_pi32 (__m128 __A) +{ + return (__m64) __builtin_ia32_cvttps2pi ((__v4sf) __A); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtt_ps2pi (__m128 __A) +{ + return _mm_cvttps_pi32 (__A); +} + +/* Convert B to a SPFP value and insert it as element zero in A. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi32_ss (__m128 __A, int __B) +{ + return (__m128) __builtin_ia32_cvtsi2ss ((__v4sf) __A, __B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_si2ss (__m128 __A, int __B) +{ + return _mm_cvtsi32_ss (__A, __B); +} + +#ifdef __x86_64__ +/* Convert B to a SPFP value and insert it as element zero in A. */ + +/* Intel intrinsic. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi64_ss (__m128 __A, long long __B) +{ + return (__m128) __builtin_ia32_cvtsi642ss ((__v4sf) __A, __B); +} + +/* Microsoft intrinsic. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsi64x_ss (__m128 __A, long long __B) +{ + return (__m128) __builtin_ia32_cvtsi642ss ((__v4sf) __A, __B); +} +#endif + +/* Convert the two 32-bit values in B to SPFP form and insert them + as the two lower elements in A. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtpi32_ps (__m128 __A, __m64 __B) +{ + return (__m128) __builtin_ia32_cvtpi2ps ((__v4sf) __A, (__v2si)__B); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvt_pi2ps (__m128 __A, __m64 __B) +{ + return _mm_cvtpi32_ps (__A, __B); +} + +/* Convert the four signed 16-bit values in A to SPFP form. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtpi16_ps (__m64 __A) +{ + __v4hi __sign; + __v2si __hisi, __losi; + __v4sf __zero, __ra, __rb; + + /* This comparison against zero gives us a mask that can be used to + fill in the missing sign bits in the unpack operations below, so + that we get signed values after unpacking. */ + __sign = __builtin_ia32_pcmpgtw ((__v4hi)0LL, (__v4hi)__A); + + /* Convert the four words to doublewords. */ + __losi = (__v2si) __builtin_ia32_punpcklwd ((__v4hi)__A, __sign); + __hisi = (__v2si) __builtin_ia32_punpckhwd ((__v4hi)__A, __sign); + + /* Convert the doublewords to floating point two at a time. */ + __zero = (__v4sf) _mm_setzero_ps (); + __ra = __builtin_ia32_cvtpi2ps (__zero, __losi); + __rb = __builtin_ia32_cvtpi2ps (__ra, __hisi); + + return (__m128) __builtin_ia32_movlhps (__ra, __rb); +} + +/* Convert the four unsigned 16-bit values in A to SPFP form. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtpu16_ps (__m64 __A) +{ + __v2si __hisi, __losi; + __v4sf __zero, __ra, __rb; + + /* Convert the four words to doublewords. */ + __losi = (__v2si) __builtin_ia32_punpcklwd ((__v4hi)__A, (__v4hi)0LL); + __hisi = (__v2si) __builtin_ia32_punpckhwd ((__v4hi)__A, (__v4hi)0LL); + + /* Convert the doublewords to floating point two at a time. */ + __zero = (__v4sf) _mm_setzero_ps (); + __ra = __builtin_ia32_cvtpi2ps (__zero, __losi); + __rb = __builtin_ia32_cvtpi2ps (__ra, __hisi); + + return (__m128) __builtin_ia32_movlhps (__ra, __rb); +} + +/* Convert the low four signed 8-bit values in A to SPFP form. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtpi8_ps (__m64 __A) +{ + __v8qi __sign; + + /* This comparison against zero gives us a mask that can be used to + fill in the missing sign bits in the unpack operations below, so + that we get signed values after unpacking. */ + __sign = __builtin_ia32_pcmpgtb ((__v8qi)0LL, (__v8qi)__A); + + /* Convert the four low bytes to words. */ + __A = (__m64) __builtin_ia32_punpcklbw ((__v8qi)__A, __sign); + + return _mm_cvtpi16_ps(__A); +} + +/* Convert the low four unsigned 8-bit values in A to SPFP form. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtpu8_ps(__m64 __A) +{ + __A = (__m64) __builtin_ia32_punpcklbw ((__v8qi)__A, (__v8qi)0LL); + return _mm_cvtpu16_ps(__A); +} + +/* Convert the four signed 32-bit values in A and B to SPFP form. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtpi32x2_ps(__m64 __A, __m64 __B) +{ + __v4sf __zero = (__v4sf) _mm_setzero_ps (); + __v4sf __sfa = __builtin_ia32_cvtpi2ps (__zero, (__v2si)__A); + __v4sf __sfb = __builtin_ia32_cvtpi2ps (__sfa, (__v2si)__B); + return (__m128) __builtin_ia32_movlhps (__sfa, __sfb); +} + +/* Convert the four SPFP values in A to four signed 16-bit integers. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtps_pi16(__m128 __A) +{ + __v4sf __hisf = (__v4sf)__A; + __v4sf __losf = __builtin_ia32_movhlps (__hisf, __hisf); + __v2si __hisi = __builtin_ia32_cvtps2pi (__hisf); + __v2si __losi = __builtin_ia32_cvtps2pi (__losf); + return (__m64) __builtin_ia32_packssdw (__hisi, __losi); +} + +/* Convert the four SPFP values in A to four signed 8-bit integers. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtps_pi8(__m128 __A) +{ + __v4hi __tmp = (__v4hi) _mm_cvtps_pi16 (__A); + return (__m64) __builtin_ia32_packsswb (__tmp, (__v4hi)0LL); +} + +/* Selects four specific SPFP values from A and B based on MASK. */ +#ifdef __OPTIMIZE__ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_shuffle_ps (__m128 __A, __m128 __B, int const __mask) +{ + return (__m128) __builtin_ia32_shufps ((__v4sf)__A, (__v4sf)__B, __mask); +} +#else +#define _mm_shuffle_ps(A, B, MASK) \ + ((__m128) __builtin_ia32_shufps ((__v4sf)(__m128)(A), \ + (__v4sf)(__m128)(B), (int)(MASK))) +#endif + +/* Selects and interleaves the upper two SPFP values from A and B. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_unpackhi_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_unpckhps ((__v4sf)__A, (__v4sf)__B); +} + +/* Selects and interleaves the lower two SPFP values from A and B. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_unpacklo_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_unpcklps ((__v4sf)__A, (__v4sf)__B); +} + +/* Sets the upper two SPFP values with 64-bits of data loaded from P; + the lower two values are passed through from A. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_loadh_pi (__m128 __A, __m64 const *__P) +{ + return (__m128) __builtin_ia32_loadhps ((__v4sf)__A, (const __v2sf *)__P); +} + +/* Stores the upper two SPFP values of A into P. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_storeh_pi (__m64 *__P, __m128 __A) +{ + __builtin_ia32_storehps ((__v2sf *)__P, (__v4sf)__A); +} + +/* Moves the upper two values of B into the lower two values of A. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_movehl_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_movhlps ((__v4sf)__A, (__v4sf)__B); +} + +/* Moves the lower two values of B into the upper two values of A. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_movelh_ps (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_movlhps ((__v4sf)__A, (__v4sf)__B); +} + +/* Sets the lower two SPFP values with 64-bits of data loaded from P; + the upper two values are passed through from A. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_loadl_pi (__m128 __A, __m64 const *__P) +{ + return (__m128) __builtin_ia32_loadlps ((__v4sf)__A, (const __v2sf *)__P); +} + +/* Stores the lower two SPFP values of A into P. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_storel_pi (__m64 *__P, __m128 __A) +{ + __builtin_ia32_storelps ((__v2sf *)__P, (__v4sf)__A); +} + +/* Creates a 4-bit mask from the most significant bits of the SPFP values. */ +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_movemask_ps (__m128 __A) +{ + return __builtin_ia32_movmskps ((__v4sf)__A); +} + +/* Return the contents of the control register. */ +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_getcsr (void) +{ + return __builtin_ia32_stmxcsr (); +} + +/* Read exception bits from the control register. */ +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_MM_GET_EXCEPTION_STATE (void) +{ + return _mm_getcsr() & _MM_EXCEPT_MASK; +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_MM_GET_EXCEPTION_MASK (void) +{ + return _mm_getcsr() & _MM_MASK_MASK; +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_MM_GET_ROUNDING_MODE (void) +{ + return _mm_getcsr() & _MM_ROUND_MASK; +} + +extern __inline unsigned int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_MM_GET_FLUSH_ZERO_MODE (void) +{ + return _mm_getcsr() & _MM_FLUSH_ZERO_MASK; +} + +/* Set the control register to I. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_setcsr (unsigned int __I) +{ + __builtin_ia32_ldmxcsr (__I); +} + +/* Set exception bits in the control register. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_MM_SET_EXCEPTION_STATE(unsigned int __mask) +{ + _mm_setcsr((_mm_getcsr() & ~_MM_EXCEPT_MASK) | __mask); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_MM_SET_EXCEPTION_MASK (unsigned int __mask) +{ + _mm_setcsr((_mm_getcsr() & ~_MM_MASK_MASK) | __mask); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_MM_SET_ROUNDING_MODE (unsigned int __mode) +{ + _mm_setcsr((_mm_getcsr() & ~_MM_ROUND_MASK) | __mode); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_MM_SET_FLUSH_ZERO_MODE (unsigned int __mode) +{ + _mm_setcsr((_mm_getcsr() & ~_MM_FLUSH_ZERO_MASK) | __mode); +} + +/* Create a vector with element 0 as F and the rest zero. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_ss (float __F) +{ + return __extension__ (__m128)(__v4sf){ __F, 0.0f, 0.0f, 0.0f }; +} + +/* Create a vector with all four elements equal to F. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set1_ps (float __F) +{ + return __extension__ (__m128)(__v4sf){ __F, __F, __F, __F }; +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_ps1 (float __F) +{ + return _mm_set1_ps (__F); +} + +/* Create a vector with element 0 as *P and the rest zero. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_load_ss (float const *__P) +{ + return _mm_set_ss (*__P); +} + +/* Create a vector with all four elements equal to *P. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_load1_ps (float const *__P) +{ + return _mm_set1_ps (*__P); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_load_ps1 (float const *__P) +{ + return _mm_load1_ps (__P); +} + +/* Load four SPFP values from P. The address must be 16-byte aligned. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_load_ps (float const *__P) +{ + return (__m128) *(__v4sf *)__P; +} + +/* Load four SPFP values from P. The address need not be 16-byte aligned. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_loadu_ps (float const *__P) +{ + return (__m128) __builtin_ia32_loadups (__P); +} + +/* Load four SPFP values in reverse order. The address must be aligned. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_loadr_ps (float const *__P) +{ + __v4sf __tmp = *(__v4sf *)__P; + return (__m128) __builtin_ia32_shufps (__tmp, __tmp, _MM_SHUFFLE (0,1,2,3)); +} + +/* Create the vector [Z Y X W]. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_ps (const float __Z, const float __Y, const float __X, const float __W) +{ + return __extension__ (__m128)(__v4sf){ __W, __X, __Y, __Z }; +} + +/* Create the vector [W X Y Z]. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_setr_ps (float __Z, float __Y, float __X, float __W) +{ + return __extension__ (__m128)(__v4sf){ __Z, __Y, __X, __W }; +} + +/* Stores the lower SPFP value. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_store_ss (float *__P, __m128 __A) +{ + *__P = __builtin_ia32_vec_ext_v4sf ((__v4sf)__A, 0); +} + +extern __inline float __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtss_f32 (__m128 __A) +{ + return __builtin_ia32_vec_ext_v4sf ((__v4sf)__A, 0); +} + +/* Store four SPFP values. The address must be 16-byte aligned. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_store_ps (float *__P, __m128 __A) +{ + *(__v4sf *)__P = (__v4sf)__A; +} + +/* Store four SPFP values. The address need not be 16-byte aligned. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_storeu_ps (float *__P, __m128 __A) +{ + __builtin_ia32_storeups (__P, (__v4sf)__A); +} + +/* Store the lower SPFP value across four words. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_store1_ps (float *__P, __m128 __A) +{ + __v4sf __va = (__v4sf)__A; + __v4sf __tmp = __builtin_ia32_shufps (__va, __va, _MM_SHUFFLE (0,0,0,0)); + _mm_storeu_ps (__P, __tmp); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_store_ps1 (float *__P, __m128 __A) +{ + _mm_store1_ps (__P, __A); +} + +/* Store four SPFP values in reverse order. The address must be aligned. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_storer_ps (float *__P, __m128 __A) +{ + __v4sf __va = (__v4sf)__A; + __v4sf __tmp = __builtin_ia32_shufps (__va, __va, _MM_SHUFFLE (0,1,2,3)); + _mm_store_ps (__P, __tmp); +} + +/* Sets the low SPFP value of A from the low value of B. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_move_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_movss ((__v4sf)__A, (__v4sf)__B); +} + +/* Extracts one of the four words of A. The selector N must be immediate. */ +#ifdef __OPTIMIZE__ +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_extract_pi16 (__m64 const __A, int const __N) +{ + return __builtin_ia32_vec_ext_v4hi ((__v4hi)__A, __N); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pextrw (__m64 const __A, int const __N) +{ + return _mm_extract_pi16 (__A, __N); +} +#else +#define _mm_extract_pi16(A, N) \ + ((int) __builtin_ia32_vec_ext_v4hi ((__v4hi)(__m64)(A), (int)(N))) + +#define _m_pextrw(A, N) _mm_extract_pi16(A, N) +#endif + +/* Inserts word D into one of four words of A. The selector N must be + immediate. */ +#ifdef __OPTIMIZE__ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_insert_pi16 (__m64 const __A, int const __D, int const __N) +{ + return (__m64) __builtin_ia32_vec_set_v4hi ((__v4hi)__A, __D, __N); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pinsrw (__m64 const __A, int const __D, int const __N) +{ + return _mm_insert_pi16 (__A, __D, __N); +} +#else +#define _mm_insert_pi16(A, D, N) \ + ((__m64) __builtin_ia32_vec_set_v4hi ((__v4hi)(__m64)(A), \ + (int)(D), (int)(N))) + +#define _m_pinsrw(A, D, N) _mm_insert_pi16(A, D, N) +#endif + +/* Compute the element-wise maximum of signed 16-bit values. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_pi16 (__m64 __A, __m64 __B) +{ + return (__m64) __builtin_ia32_pmaxsw ((__v4hi)__A, (__v4hi)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pmaxsw (__m64 __A, __m64 __B) +{ + return _mm_max_pi16 (__A, __B); +} + +/* Compute the element-wise maximum of unsigned 8-bit values. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_max_pu8 (__m64 __A, __m64 __B) +{ + return (__m64) __builtin_ia32_pmaxub ((__v8qi)__A, (__v8qi)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pmaxub (__m64 __A, __m64 __B) +{ + return _mm_max_pu8 (__A, __B); +} + +/* Compute the element-wise minimum of signed 16-bit values. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_pi16 (__m64 __A, __m64 __B) +{ + return (__m64) __builtin_ia32_pminsw ((__v4hi)__A, (__v4hi)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pminsw (__m64 __A, __m64 __B) +{ + return _mm_min_pi16 (__A, __B); +} + +/* Compute the element-wise minimum of unsigned 8-bit values. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_min_pu8 (__m64 __A, __m64 __B) +{ + return (__m64) __builtin_ia32_pminub ((__v8qi)__A, (__v8qi)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pminub (__m64 __A, __m64 __B) +{ + return _mm_min_pu8 (__A, __B); +} + +/* Create an 8-bit mask of the signs of 8-bit values. */ +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_movemask_pi8 (__m64 __A) +{ + return __builtin_ia32_pmovmskb ((__v8qi)__A); +} + +extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pmovmskb (__m64 __A) +{ + return _mm_movemask_pi8 (__A); +} + +/* Multiply four unsigned 16-bit values in A by four unsigned 16-bit values + in B and produce the high 16 bits of the 32-bit results. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mulhi_pu16 (__m64 __A, __m64 __B) +{ + return (__m64) __builtin_ia32_pmulhuw ((__v4hi)__A, (__v4hi)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pmulhuw (__m64 __A, __m64 __B) +{ + return _mm_mulhi_pu16 (__A, __B); +} + +/* Return a combination of the four 16-bit values in A. The selector + must be an immediate. */ +#ifdef __OPTIMIZE__ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_shuffle_pi16 (__m64 __A, int const __N) +{ + return (__m64) __builtin_ia32_pshufw ((__v4hi)__A, __N); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pshufw (__m64 __A, int const __N) +{ + return _mm_shuffle_pi16 (__A, __N); +} +#else +#define _mm_shuffle_pi16(A, N) \ + ((__m64) __builtin_ia32_pshufw ((__v4hi)(__m64)(A), (int)(N))) + +#define _m_pshufw(A, N) _mm_shuffle_pi16 (A, N) +#endif + +/* Conditionally store byte elements of A into P. The high bit of each + byte in the selector N determines whether the corresponding byte from + A is stored. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskmove_si64 (__m64 __A, __m64 __N, char *__P) +{ + __builtin_ia32_maskmovq ((__v8qi)__A, (__v8qi)__N, __P); +} + +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_maskmovq (__m64 __A, __m64 __N, char *__P) +{ + _mm_maskmove_si64 (__A, __N, __P); +} + +/* Compute the rounded averages of the unsigned 8-bit values in A and B. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_avg_pu8 (__m64 __A, __m64 __B) +{ + return (__m64) __builtin_ia32_pavgb ((__v8qi)__A, (__v8qi)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pavgb (__m64 __A, __m64 __B) +{ + return _mm_avg_pu8 (__A, __B); +} + +/* Compute the rounded averages of the unsigned 16-bit values in A and B. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_avg_pu16 (__m64 __A, __m64 __B) +{ + return (__m64) __builtin_ia32_pavgw ((__v4hi)__A, (__v4hi)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_pavgw (__m64 __A, __m64 __B) +{ + return _mm_avg_pu16 (__A, __B); +} + +/* Compute the sum of the absolute differences of the unsigned 8-bit + values in A and B. Return the value in the lower 16-bit word; the + upper words are cleared. */ +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sad_pu8 (__m64 __A, __m64 __B) +{ + return (__m64) __builtin_ia32_psadbw ((__v8qi)__A, (__v8qi)__B); +} + +extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_m_psadbw (__m64 __A, __m64 __B) +{ + return _mm_sad_pu8 (__A, __B); +} + +/* Stores the data in A to the address P without polluting the caches. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_stream_pi (__m64 *__P, __m64 __A) +{ + __builtin_ia32_movntq ((unsigned long long *)__P, (unsigned long long)__A); +} + +/* Likewise. The address must be 16-byte aligned. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_stream_ps (float *__P, __m128 __A) +{ + __builtin_ia32_movntps (__P, (__v4sf)__A); +} + +/* Guarantees that every preceding store is globally visible before + any subsequent store. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sfence (void) +{ + __builtin_ia32_sfence (); +} + +/* The execution of the next instruction is delayed by an implementation + specific amount of time. The instruction does not modify the + architectural state. */ +extern __inline void __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_pause (void) +{ + __builtin_ia32_pause (); +} + +/* Transpose the 4x4 matrix composed of row[0-3]. */ +#define _MM_TRANSPOSE4_PS(row0, row1, row2, row3) \ +do { \ + __v4sf __r0 = (row0), __r1 = (row1), __r2 = (row2), __r3 = (row3); \ + __v4sf __t0 = __builtin_ia32_unpcklps (__r0, __r1); \ + __v4sf __t1 = __builtin_ia32_unpcklps (__r2, __r3); \ + __v4sf __t2 = __builtin_ia32_unpckhps (__r0, __r1); \ + __v4sf __t3 = __builtin_ia32_unpckhps (__r2, __r3); \ + (row0) = __builtin_ia32_movlhps (__t0, __t1); \ + (row1) = __builtin_ia32_movhlps (__t1, __t0); \ + (row2) = __builtin_ia32_movlhps (__t2, __t3); \ + (row3) = __builtin_ia32_movhlps (__t3, __t2); \ +} while (0) + +/* For backward source compatibility. */ +# include + +#ifdef __DISABLE_SSE__ +#undef __DISABLE_SSE__ +#pragma GCC pop_options +#endif /* __DISABLE_SSE__ */ + +#endif /* _XMMINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/xopintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/xopintrin.h new file mode 100644 index 0000000..cc82bc5 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/xopintrin.h @@ -0,0 +1,844 @@ +/* Copyright (C) 2007-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _X86INTRIN_H_INCLUDED +# error "Never use directly; include instead." +#endif + +#ifndef _XOPMMINTRIN_H_INCLUDED +#define _XOPMMINTRIN_H_INCLUDED + +#include + +#ifndef __XOP__ +#pragma GCC push_options +#pragma GCC target("xop") +#define __DISABLE_XOP__ +#endif /* __XOP__ */ + +/* Integer multiply/add intructions. */ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maccs_epi16(__m128i __A, __m128i __B, __m128i __C) +{ + return (__m128i) __builtin_ia32_vpmacssww ((__v8hi)__A,(__v8hi)__B, (__v8hi)__C); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_macc_epi16(__m128i __A, __m128i __B, __m128i __C) +{ + return (__m128i) __builtin_ia32_vpmacsww ((__v8hi)__A, (__v8hi)__B, (__v8hi)__C); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maccsd_epi16(__m128i __A, __m128i __B, __m128i __C) +{ + return (__m128i) __builtin_ia32_vpmacsswd ((__v8hi)__A, (__v8hi)__B, (__v4si)__C); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maccd_epi16(__m128i __A, __m128i __B, __m128i __C) +{ + return (__m128i) __builtin_ia32_vpmacswd ((__v8hi)__A, (__v8hi)__B, (__v4si)__C); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maccs_epi32(__m128i __A, __m128i __B, __m128i __C) +{ + return (__m128i) __builtin_ia32_vpmacssdd ((__v4si)__A, (__v4si)__B, (__v4si)__C); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_macc_epi32(__m128i __A, __m128i __B, __m128i __C) +{ + return (__m128i) __builtin_ia32_vpmacsdd ((__v4si)__A, (__v4si)__B, (__v4si)__C); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maccslo_epi32(__m128i __A, __m128i __B, __m128i __C) +{ + return (__m128i) __builtin_ia32_vpmacssdql ((__v4si)__A, (__v4si)__B, (__v2di)__C); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_macclo_epi32(__m128i __A, __m128i __B, __m128i __C) +{ + return (__m128i) __builtin_ia32_vpmacsdql ((__v4si)__A, (__v4si)__B, (__v2di)__C); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maccshi_epi32(__m128i __A, __m128i __B, __m128i __C) +{ + return (__m128i) __builtin_ia32_vpmacssdqh ((__v4si)__A, (__v4si)__B, (__v2di)__C); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_macchi_epi32(__m128i __A, __m128i __B, __m128i __C) +{ + return (__m128i) __builtin_ia32_vpmacsdqh ((__v4si)__A, (__v4si)__B, (__v2di)__C); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maddsd_epi16(__m128i __A, __m128i __B, __m128i __C) +{ + return (__m128i) __builtin_ia32_vpmadcsswd ((__v8hi)__A,(__v8hi)__B,(__v4si)__C); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maddd_epi16(__m128i __A, __m128i __B, __m128i __C) +{ + return (__m128i) __builtin_ia32_vpmadcswd ((__v8hi)__A,(__v8hi)__B,(__v4si)__C); +} + +/* Packed Integer Horizontal Add and Subtract */ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_haddw_epi8(__m128i __A) +{ + return (__m128i) __builtin_ia32_vphaddbw ((__v16qi)__A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_haddd_epi8(__m128i __A) +{ + return (__m128i) __builtin_ia32_vphaddbd ((__v16qi)__A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_haddq_epi8(__m128i __A) +{ + return (__m128i) __builtin_ia32_vphaddbq ((__v16qi)__A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_haddd_epi16(__m128i __A) +{ + return (__m128i) __builtin_ia32_vphaddwd ((__v8hi)__A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_haddq_epi16(__m128i __A) +{ + return (__m128i) __builtin_ia32_vphaddwq ((__v8hi)__A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_haddq_epi32(__m128i __A) +{ + return (__m128i) __builtin_ia32_vphadddq ((__v4si)__A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_haddw_epu8(__m128i __A) +{ + return (__m128i) __builtin_ia32_vphaddubw ((__v16qi)__A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_haddd_epu8(__m128i __A) +{ + return (__m128i) __builtin_ia32_vphaddubd ((__v16qi)__A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_haddq_epu8(__m128i __A) +{ + return (__m128i) __builtin_ia32_vphaddubq ((__v16qi)__A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_haddd_epu16(__m128i __A) +{ + return (__m128i) __builtin_ia32_vphadduwd ((__v8hi)__A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_haddq_epu16(__m128i __A) +{ + return (__m128i) __builtin_ia32_vphadduwq ((__v8hi)__A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_haddq_epu32(__m128i __A) +{ + return (__m128i) __builtin_ia32_vphaddudq ((__v4si)__A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hsubw_epi8(__m128i __A) +{ + return (__m128i) __builtin_ia32_vphsubbw ((__v16qi)__A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hsubd_epi16(__m128i __A) +{ + return (__m128i) __builtin_ia32_vphsubwd ((__v8hi)__A); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_hsubq_epi32(__m128i __A) +{ + return (__m128i) __builtin_ia32_vphsubdq ((__v4si)__A); +} + +/* Vector conditional move and permute */ + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cmov_si128(__m128i __A, __m128i __B, __m128i __C) +{ + return (__m128i) __builtin_ia32_vpcmov (__A, __B, __C); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_perm_epi8(__m128i __A, __m128i __B, __m128i __C) +{ + return (__m128i) __builtin_ia32_vpperm ((__v16qi)__A, (__v16qi)__B, (__v16qi)__C); +} + +/* Packed Integer Rotates and Shifts + Rotates - Non-Immediate form */ + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rot_epi8(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vprotb ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rot_epi16(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vprotw ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rot_epi32(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vprotd ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_rot_epi64(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vprotq ((__v2di)__A, (__v2di)__B); +} + +/* Rotates - Immediate form */ + +#ifdef __OPTIMIZE__ +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_roti_epi8(__m128i __A, const int __B) +{ + return (__m128i) __builtin_ia32_vprotbi ((__v16qi)__A, __B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_roti_epi16(__m128i __A, const int __B) +{ + return (__m128i) __builtin_ia32_vprotwi ((__v8hi)__A, __B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_roti_epi32(__m128i __A, const int __B) +{ + return (__m128i) __builtin_ia32_vprotdi ((__v4si)__A, __B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_roti_epi64(__m128i __A, const int __B) +{ + return (__m128i) __builtin_ia32_vprotqi ((__v2di)__A, __B); +} +#else +#define _mm_roti_epi8(A, N) \ + ((__m128i) __builtin_ia32_vprotbi ((__v16qi)(__m128i)(A), (int)(N))) +#define _mm_roti_epi16(A, N) \ + ((__m128i) __builtin_ia32_vprotwi ((__v8hi)(__m128i)(A), (int)(N))) +#define _mm_roti_epi32(A, N) \ + ((__m128i) __builtin_ia32_vprotdi ((__v4si)(__m128i)(A), (int)(N))) +#define _mm_roti_epi64(A, N) \ + ((__m128i) __builtin_ia32_vprotqi ((__v2di)(__m128i)(A), (int)(N))) +#endif + +/* Shifts */ + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_shl_epi8(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpshlb ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_shl_epi16(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpshlw ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_shl_epi32(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpshld ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_shl_epi64(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpshlq ((__v2di)__A, (__v2di)__B); +} + + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sha_epi8(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpshab ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sha_epi16(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpshaw ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sha_epi32(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpshad ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_sha_epi64(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpshaq ((__v2di)__A, (__v2di)__B); +} + +/* Compare and Predicate Generation + pcom (integer, unsinged bytes) */ + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comlt_epu8(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomltub ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comle_epu8(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomleub ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comgt_epu8(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomgtub ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comge_epu8(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomgeub ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comeq_epu8(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomequb ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comneq_epu8(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomnequb ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comfalse_epu8(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomfalseub ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comtrue_epu8(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomtrueub ((__v16qi)__A, (__v16qi)__B); +} + +/*pcom (integer, unsinged words) */ + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comlt_epu16(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomltuw ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comle_epu16(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomleuw ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comgt_epu16(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomgtuw ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comge_epu16(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomgeuw ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comeq_epu16(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomequw ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comneq_epu16(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomnequw ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comfalse_epu16(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomfalseuw ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comtrue_epu16(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomtrueuw ((__v8hi)__A, (__v8hi)__B); +} + +/*pcom (integer, unsinged double words) */ + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comlt_epu32(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomltud ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comle_epu32(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomleud ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comgt_epu32(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomgtud ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comge_epu32(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomgeud ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comeq_epu32(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomequd ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comneq_epu32(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomnequd ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comfalse_epu32(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomfalseud ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comtrue_epu32(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomtrueud ((__v4si)__A, (__v4si)__B); +} + +/*pcom (integer, unsinged quad words) */ + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comlt_epu64(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomltuq ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comle_epu64(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomleuq ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comgt_epu64(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomgtuq ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comge_epu64(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomgeuq ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comeq_epu64(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomequq ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comneq_epu64(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomnequq ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comfalse_epu64(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomfalseuq ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comtrue_epu64(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomtrueuq ((__v2di)__A, (__v2di)__B); +} + +/*pcom (integer, signed bytes) */ + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comlt_epi8(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomltb ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comle_epi8(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomleb ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comgt_epi8(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomgtb ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comge_epi8(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomgeb ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comeq_epi8(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomeqb ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comneq_epi8(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomneqb ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comfalse_epi8(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomfalseb ((__v16qi)__A, (__v16qi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comtrue_epi8(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomtrueb ((__v16qi)__A, (__v16qi)__B); +} + +/*pcom (integer, signed words) */ + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comlt_epi16(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomltw ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comle_epi16(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomlew ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comgt_epi16(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomgtw ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comge_epi16(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomgew ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comeq_epi16(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomeqw ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comneq_epi16(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomneqw ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comfalse_epi16(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomfalsew ((__v8hi)__A, (__v8hi)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comtrue_epi16(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomtruew ((__v8hi)__A, (__v8hi)__B); +} + +/*pcom (integer, signed double words) */ + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comlt_epi32(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomltd ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comle_epi32(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomled ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comgt_epi32(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomgtd ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comge_epi32(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomged ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comeq_epi32(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomeqd ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comneq_epi32(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomneqd ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comfalse_epi32(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomfalsed ((__v4si)__A, (__v4si)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comtrue_epi32(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomtrued ((__v4si)__A, (__v4si)__B); +} + +/*pcom (integer, signed quad words) */ + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comlt_epi64(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomltq ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comle_epi64(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomleq ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comgt_epi64(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomgtq ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comge_epi64(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomgeq ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comeq_epi64(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomeqq ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comneq_epi64(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomneqq ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comfalse_epi64(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomfalseq ((__v2di)__A, (__v2di)__B); +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_comtrue_epi64(__m128i __A, __m128i __B) +{ + return (__m128i) __builtin_ia32_vpcomtrueq ((__v2di)__A, (__v2di)__B); +} + +/* FRCZ */ + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_frcz_ps (__m128 __A) +{ + return (__m128) __builtin_ia32_vfrczps ((__v4sf)__A); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_frcz_pd (__m128d __A) +{ + return (__m128d) __builtin_ia32_vfrczpd ((__v2df)__A); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_frcz_ss (__m128 __A, __m128 __B) +{ + return (__m128) __builtin_ia32_movss ((__v4sf)__A, + (__v4sf) + __builtin_ia32_vfrczss ((__v4sf)__B)); +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_frcz_sd (__m128d __A, __m128d __B) +{ + return (__m128d) __builtin_ia32_movsd ((__v2df)__A, + (__v2df) + __builtin_ia32_vfrczsd ((__v2df)__B)); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_frcz_ps (__m256 __A) +{ + return (__m256) __builtin_ia32_vfrczps256 ((__v8sf)__A); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_frcz_pd (__m256d __A) +{ + return (__m256d) __builtin_ia32_vfrczpd256 ((__v4df)__A); +} + +/* PERMIL2 */ + +#ifdef __OPTIMIZE__ +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_permute2_pd (__m128d __X, __m128d __Y, __m128i __C, const int __I) +{ + return (__m128d) __builtin_ia32_vpermil2pd ((__v2df)__X, + (__v2df)__Y, + (__v2di)__C, + __I); +} + +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permute2_pd (__m256d __X, __m256d __Y, __m256i __C, const int __I) +{ + return (__m256d) __builtin_ia32_vpermil2pd256 ((__v4df)__X, + (__v4df)__Y, + (__v4di)__C, + __I); +} + +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_permute2_ps (__m128 __X, __m128 __Y, __m128i __C, const int __I) +{ + return (__m128) __builtin_ia32_vpermil2ps ((__v4sf)__X, + (__v4sf)__Y, + (__v4si)__C, + __I); +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permute2_ps (__m256 __X, __m256 __Y, __m256i __C, const int __I) +{ + return (__m256) __builtin_ia32_vpermil2ps256 ((__v8sf)__X, + (__v8sf)__Y, + (__v8si)__C, + __I); +} +#else +#define _mm_permute2_pd(X, Y, C, I) \ + ((__m128d) __builtin_ia32_vpermil2pd ((__v2df)(__m128d)(X), \ + (__v2df)(__m128d)(Y), \ + (__v2di)(__m128d)(C), \ + (int)(I))) + +#define _mm256_permute2_pd(X, Y, C, I) \ + ((__m256d) __builtin_ia32_vpermil2pd256 ((__v4df)(__m256d)(X), \ + (__v4df)(__m256d)(Y), \ + (__v4di)(__m256d)(C), \ + (int)(I))) + +#define _mm_permute2_ps(X, Y, C, I) \ + ((__m128) __builtin_ia32_vpermil2ps ((__v4sf)(__m128)(X), \ + (__v4sf)(__m128)(Y), \ + (__v4si)(__m128)(C), \ + (int)(I))) + +#define _mm256_permute2_ps(X, Y, C, I) \ + ((__m256) __builtin_ia32_vpermil2ps256 ((__v8sf)(__m256)(X), \ + (__v8sf)(__m256)(Y), \ + (__v8si)(__m256)(C), \ + (int)(I))) +#endif /* __OPTIMIZE__ */ + +#ifdef __DISABLE_XOP__ +#undef __DISABLE_XOP__ +#pragma GCC pop_options +#endif /* __DISABLE_XOP__ */ + +#endif /* _XOPMMINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/xsaveintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/xsaveintrin.h new file mode 100644 index 0000000..47be25f --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/xsaveintrin.h @@ -0,0 +1,72 @@ +/* Copyright (C) 2012-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* #if !defined _X86INTRIN_H_INCLUDED && !defined _IMMINTRIN_H_INCLUDED */ +/* # error "Never use directly; include instead." */ +/* #endif */ + +#ifndef _XSAVEINTRIN_H_INCLUDED +#define _XSAVEINTRIN_H_INCLUDED + +#ifndef __XSAVE__ +#pragma GCC push_options +#pragma GCC target("xsave") +#define __DISABLE_XSAVE__ +#endif /* __XSAVE__ */ + +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_xsave (void *__P, long long __M) +{ + return __builtin_ia32_xsave (__P, __M); +} + +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_xrstor (void *__P, long long __M) +{ + return __builtin_ia32_xrstor (__P, __M); +} + +#ifdef __x86_64__ +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_xsave64 (void *__P, long long __M) +{ + return __builtin_ia32_xsave64 (__P, __M); +} + +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_xrstor64 (void *__P, long long __M) +{ + return __builtin_ia32_xrstor64 (__P, __M); +} +#endif + +#ifdef __DISABLE_XSAVE__ +#undef __DISABLE_XSAVE__ +#pragma GCC pop_options +#endif /* __DISABLE_XSAVE__ */ + +#endif /* _XSAVEINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/xsaveoptintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/xsaveoptintrin.h new file mode 100644 index 0000000..d7534b4 --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/xsaveoptintrin.h @@ -0,0 +1,58 @@ +/* Copyright (C) 2012-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* #if !defined _X86INTRIN_H_INCLUDED && !defined _IMMINTRIN_H_INCLUDED */ +/* # error "Never use directly; include instead." */ +/* #endif */ + +#ifndef _XSAVEOPTINTRIN_H_INCLUDED +#define _XSAVEOPTINTRIN_H_INCLUDED + +#ifndef __XSAVEOPT__ +#pragma GCC push_options +#pragma GCC target("xsaveopt") +#define __DISABLE_XSAVEOPT__ +#endif /* __XSAVEOPT__ */ + +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_xsaveopt (void *__P, long long __M) +{ + return __builtin_ia32_xsaveopt (__P, __M); +} + +#ifdef __x86_64__ +extern __inline void +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_xsaveopt64 (void *__P, long long __M) +{ + return __builtin_ia32_xsaveopt64 (__P, __M); +} +#endif + +#ifdef __DISABLE_XSAVEOPT__ +#undef __DISABLE_XSAVEOPT__ +#pragma GCC pop_options +#endif /* __DISABLE_XSAVEOPT__ */ + +#endif /* _XSAVEOPTINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/include/xtestintrin.h b/lib/gcc/x86_64-linux-android/4.9.x/include/xtestintrin.h new file mode 100644 index 0000000..ba79e5c --- /dev/null +++ b/lib/gcc/x86_64-linux-android/4.9.x/include/xtestintrin.h @@ -0,0 +1,51 @@ +/* Copyright (C) 2012-2014 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _IMMINTRIN_H_INCLUDED +# error "Never use directly; include instead." +#endif + +#ifndef _XTESTINTRIN_H_INCLUDED +#define _XTESTINTRIN_H_INCLUDED + +#ifndef __RTM__ +#pragma GCC push_options +#pragma GCC target("rtm") +#define __DISABLE_RTM__ +#endif /* __RTM__ */ + +/* Return non-zero if the instruction executes inside an RTM or HLE code + region. Return zero otherwise. */ +extern __inline int +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_xtest (void) +{ + return __builtin_ia32_xtest (); +} + +#ifdef __DISABLE_RTM__ +#undef __DISABLE_RTM__ +#pragma GCC pop_options +#endif /* __DISABLE_RTM__ */ + +#endif /* _XTESTINTRIN_H_INCLUDED */ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/libgcc.a b/lib/gcc/x86_64-linux-android/4.9.x/libgcc.a new file mode 100644 index 0000000..f08b5b5 Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/libgcc.a differ diff --git a/lib/gcc/x86_64-linux-android/4.9.x/libgcov.a b/lib/gcc/x86_64-linux-android/4.9.x/libgcov.a new file mode 100644 index 0000000..47345e3 Binary files /dev/null and b/lib/gcc/x86_64-linux-android/4.9.x/libgcov.a differ diff --git a/libexec/gcc/x86_64-linux-android/4.9.x/cc1 b/libexec/gcc/x86_64-linux-android/4.9.x/cc1 new file mode 100755 index 0000000..a675fe4 Binary files /dev/null and b/libexec/gcc/x86_64-linux-android/4.9.x/cc1 differ diff --git a/libexec/gcc/x86_64-linux-android/4.9.x/cc1plus b/libexec/gcc/x86_64-linux-android/4.9.x/cc1plus new file mode 100755 index 0000000..127ff44 Binary files /dev/null and b/libexec/gcc/x86_64-linux-android/4.9.x/cc1plus differ diff --git a/libexec/gcc/x86_64-linux-android/4.9.x/collect2 b/libexec/gcc/x86_64-linux-android/4.9.x/collect2 new file mode 100755 index 0000000..c813b0c Binary files /dev/null and b/libexec/gcc/x86_64-linux-android/4.9.x/collect2 differ diff --git a/libexec/gcc/x86_64-linux-android/4.9.x/libfunction_reordering_plugin.so b/libexec/gcc/x86_64-linux-android/4.9.x/libfunction_reordering_plugin.so new file mode 120000 index 0000000..6818e7a --- /dev/null +++ b/libexec/gcc/x86_64-linux-android/4.9.x/libfunction_reordering_plugin.so @@ -0,0 +1 @@ +libfunction_reordering_plugin.so.0.0.0 \ No newline at end of file diff --git a/libexec/gcc/x86_64-linux-android/4.9.x/libfunction_reordering_plugin.so.0 b/libexec/gcc/x86_64-linux-android/4.9.x/libfunction_reordering_plugin.so.0 new file mode 120000 index 0000000..6818e7a --- /dev/null +++ b/libexec/gcc/x86_64-linux-android/4.9.x/libfunction_reordering_plugin.so.0 @@ -0,0 +1 @@ +libfunction_reordering_plugin.so.0.0.0 \ No newline at end of file diff --git a/libexec/gcc/x86_64-linux-android/4.9.x/libfunction_reordering_plugin.so.0.0.0 b/libexec/gcc/x86_64-linux-android/4.9.x/libfunction_reordering_plugin.so.0.0.0 new file mode 100755 index 0000000..34ca3e3 Binary files /dev/null and b/libexec/gcc/x86_64-linux-android/4.9.x/libfunction_reordering_plugin.so.0.0.0 differ diff --git a/libexec/gcc/x86_64-linux-android/4.9.x/liblto_plugin.so b/libexec/gcc/x86_64-linux-android/4.9.x/liblto_plugin.so new file mode 120000 index 0000000..f25ba88 --- /dev/null +++ b/libexec/gcc/x86_64-linux-android/4.9.x/liblto_plugin.so @@ -0,0 +1 @@ +liblto_plugin.so.0.0.0 \ No newline at end of file diff --git a/libexec/gcc/x86_64-linux-android/4.9.x/liblto_plugin.so.0 b/libexec/gcc/x86_64-linux-android/4.9.x/liblto_plugin.so.0 new file mode 120000 index 0000000..f25ba88 --- /dev/null +++ b/libexec/gcc/x86_64-linux-android/4.9.x/liblto_plugin.so.0 @@ -0,0 +1 @@ +liblto_plugin.so.0.0.0 \ No newline at end of file diff --git a/libexec/gcc/x86_64-linux-android/4.9.x/liblto_plugin.so.0.0.0 b/libexec/gcc/x86_64-linux-android/4.9.x/liblto_plugin.so.0.0.0 new file mode 100755 index 0000000..916b2cf Binary files /dev/null and b/libexec/gcc/x86_64-linux-android/4.9.x/liblto_plugin.so.0.0.0 differ diff --git a/libexec/gcc/x86_64-linux-android/4.9.x/lto-wrapper b/libexec/gcc/x86_64-linux-android/4.9.x/lto-wrapper new file mode 100755 index 0000000..9299676 Binary files /dev/null and b/libexec/gcc/x86_64-linux-android/4.9.x/lto-wrapper differ diff --git a/libexec/gcc/x86_64-linux-android/4.9.x/lto1 b/libexec/gcc/x86_64-linux-android/4.9.x/lto1 new file mode 100755 index 0000000..552cbf4 Binary files /dev/null and b/libexec/gcc/x86_64-linux-android/4.9.x/lto1 differ diff --git a/libexec/gcc/x86_64-linux-android/4.9.x/plugin/gengtype b/libexec/gcc/x86_64-linux-android/4.9.x/plugin/gengtype new file mode 100755 index 0000000..a50c816 Binary files /dev/null and b/libexec/gcc/x86_64-linux-android/4.9.x/plugin/gengtype differ diff --git a/repo.prop b/repo.prop new file mode 100644 index 0000000..1313a50 --- /dev/null +++ b/repo.prop @@ -0,0 +1,19 @@ +platform/external/python/cpython3 b0e85ca1ed410f0196f20f4c37d16dafd4878a3d +platform/manifest 1d5da1f2fff34d121874f1ebd2f9a413c8c870fb +platform/ndk d8921b1c5e6a8d3713fbb58959f7c61a02a75bdd +platform/prebuilts/gcc/darwin-x86/host/headers 4ac4f7cc41cf3c9e36fc3d6cf37fd1cfa9587a68 +platform/prebuilts/gcc/darwin-x86/host/i686-apple-darwin-4.2.1 ec5aa66aaa4964c27564d0ec84dc1f18a2d72b7e +platform/prebuilts/gcc/linux-x86/host/x86_64-linux-glibc2.15-4.8 d9aafaade740ca38612c742f6d87debf362132ea +platform/prebuilts/gcc/linux-x86/host/x86_64-w64-mingw32-4.8 f0e27ecffc023e9b76a1436385dc46920ebba7de +platform/prebuilts/ndk d0da8fe3413d5bdacfad25cd7f730bf033a7e5f8 +toolchain/binutils bd24d23f081feb4eb1438c56ace4ae91778ae6be +toolchain/build 58be6006bb71abb97d7cdff7be3e73d55bbc22b8 +toolchain/cloog 604793eab97d360aef729f064674569ee6dbf3e1 +toolchain/expat 40172a0ae9d40a068f1e1a48ffcf6a1ccf765ed5 +toolchain/gcc a9d5e6fb22fe3283eeecfb97211a12c072a7b469 +toolchain/gmp b2acd5dbf47868ac5b5bc844e16d2cadcbd4c810 +toolchain/isl 0ccf95726af8ce58ad61ff474addbce3a31ba99c +toolchain/mpc 835d16e92eed875638a8b5d552034c3b1aae045b +toolchain/mpfr de979fc377db766591e7feaf052f0de59be46e76 +toolchain/ppl 979062d362bc5a1c00804237b408b19b4618fb24 +toolchain/sed 45df23d6dc8b51ea5cd903d023c10fd7d72415b9 diff --git a/x86_64-linux-android/bin/ar b/x86_64-linux-android/bin/ar new file mode 120000 index 0000000..96f839e --- /dev/null +++ b/x86_64-linux-android/bin/ar @@ -0,0 +1 @@ +../../bin/x86_64-linux-android-ar \ No newline at end of file diff --git a/x86_64-linux-android/bin/as b/x86_64-linux-android/bin/as new file mode 120000 index 0000000..22c6e67 --- /dev/null +++ b/x86_64-linux-android/bin/as @@ -0,0 +1 @@ +../../bin/x86_64-linux-android-as \ No newline at end of file diff --git a/x86_64-linux-android/bin/ld b/x86_64-linux-android/bin/ld new file mode 120000 index 0000000..32f02e0 --- /dev/null +++ b/x86_64-linux-android/bin/ld @@ -0,0 +1 @@ +../../bin/x86_64-linux-android-ld \ No newline at end of file diff --git a/x86_64-linux-android/bin/ld.bfd b/x86_64-linux-android/bin/ld.bfd new file mode 120000 index 0000000..78dec82 --- /dev/null +++ b/x86_64-linux-android/bin/ld.bfd @@ -0,0 +1 @@ +../../bin/x86_64-linux-android-ld.bfd \ No newline at end of file diff --git a/x86_64-linux-android/bin/ld.gold b/x86_64-linux-android/bin/ld.gold new file mode 120000 index 0000000..cc9a1e9 --- /dev/null +++ b/x86_64-linux-android/bin/ld.gold @@ -0,0 +1 @@ +../../bin/x86_64-linux-android-ld.gold \ No newline at end of file diff --git a/x86_64-linux-android/bin/nm b/x86_64-linux-android/bin/nm new file mode 120000 index 0000000..2e87749 --- /dev/null +++ b/x86_64-linux-android/bin/nm @@ -0,0 +1 @@ +../../bin/x86_64-linux-android-nm \ No newline at end of file diff --git a/x86_64-linux-android/bin/objcopy b/x86_64-linux-android/bin/objcopy new file mode 120000 index 0000000..22d7916 --- /dev/null +++ b/x86_64-linux-android/bin/objcopy @@ -0,0 +1 @@ +../../bin/x86_64-linux-android-objcopy \ No newline at end of file diff --git a/x86_64-linux-android/bin/objdump b/x86_64-linux-android/bin/objdump new file mode 120000 index 0000000..150ba55 --- /dev/null +++ b/x86_64-linux-android/bin/objdump @@ -0,0 +1 @@ +../../bin/x86_64-linux-android-objdump \ No newline at end of file diff --git a/x86_64-linux-android/bin/ranlib b/x86_64-linux-android/bin/ranlib new file mode 120000 index 0000000..13f26b2 --- /dev/null +++ b/x86_64-linux-android/bin/ranlib @@ -0,0 +1 @@ +../../bin/x86_64-linux-android-ranlib \ No newline at end of file diff --git a/x86_64-linux-android/bin/readelf b/x86_64-linux-android/bin/readelf new file mode 120000 index 0000000..ac4115d --- /dev/null +++ b/x86_64-linux-android/bin/readelf @@ -0,0 +1 @@ +../../bin/x86_64-linux-android-readelf \ No newline at end of file diff --git a/x86_64-linux-android/bin/strip b/x86_64-linux-android/bin/strip new file mode 120000 index 0000000..7404ebb --- /dev/null +++ b/x86_64-linux-android/bin/strip @@ -0,0 +1 @@ +../../bin/x86_64-linux-android-strip \ No newline at end of file diff --git a/x86_64-linux-android/lib/ldscripts/elf32_x86_64.x b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.x new file mode 100644 index 0000000..c1f39da --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.x @@ -0,0 +1,226 @@ +/* Default linker script, for normal executables */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-x86-64", "elf32-x86-64", + "elf32-x86-64") +OUTPUT_ARCH(i386:x64-32) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/libx32"); SEARCH_DIR("=/usr/local/libx32"); SEARCH_DIR("=/libx32"); SEARCH_DIR("=/usr/libx32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.init : { *(.rela.init) } + .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } + .rela.fini : { *(.rela.fini) } + .rela.rodata : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) } + .rela.data.rel.ro : { *(.rela.data.rel.ro .rela.data.rel.ro.* .rela.gnu.linkonce.d.rel.ro.*) } + .rela.data : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) } + .rela.tdata : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) } + .rela.tbss : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) } + .rela.ctors : { *(.rela.ctors) } + .rela.dtors : { *(.rela.dtors) } + .rela.got : { *(.rela.got) } + .rela.bss : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) } + .rela.ldata : { *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) } + .rela.lbss : { *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) } + .rela.lrodata : { *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) } + .rela.ifunc : { *(.rela.ifunc) } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xbn b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xbn new file mode 100644 index 0000000..c654444 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xbn @@ -0,0 +1,224 @@ +/* Script for -N: mix text and data on same page; don't align data */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-x86-64", "elf32-x86-64", + "elf32-x86-64") +OUTPUT_ARCH(i386:x64-32) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/libx32"); SEARCH_DIR("=/usr/local/libx32"); SEARCH_DIR("=/libx32"); SEARCH_DIR("=/usr/libx32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.init : { *(.rela.init) } + .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } + .rela.fini : { *(.rela.fini) } + .rela.rodata : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) } + .rela.data.rel.ro : { *(.rela.data.rel.ro .rela.data.rel.ro.* .rela.gnu.linkonce.d.rel.ro.*) } + .rela.data : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) } + .rela.tdata : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) } + .rela.tbss : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) } + .rela.ctors : { *(.rela.ctors) } + .rela.dtors : { *(.rela.dtors) } + .rela.got : { *(.rela.got) } + .rela.bss : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) } + .rela.ldata : { *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) } + .rela.lbss : { *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) } + .rela.lrodata : { *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) } + .rela.ifunc : { *(.rela.ifunc) } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = .; + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xc b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xc new file mode 100644 index 0000000..93ccd63 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xc @@ -0,0 +1,228 @@ +/* Script for -z combreloc: combine and sort reloc sections */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-x86-64", "elf32-x86-64", + "elf32-x86-64") +OUTPUT_ARCH(i386:x64-32) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/libx32"); SEARCH_DIR("=/usr/local/libx32"); SEARCH_DIR("=/libx32"); SEARCH_DIR("=/usr/libx32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xd b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xd new file mode 100644 index 0000000..34e2b6b --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xd @@ -0,0 +1,226 @@ +/* Script for ld -pie: link position independent executable */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-x86-64", "elf32-x86-64", + "elf32-x86-64") +OUTPUT_ARCH(i386:x64-32) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/libx32"); SEARCH_DIR("=/usr/local/libx32"); SEARCH_DIR("=/libx32"); SEARCH_DIR("=/usr/libx32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0)); . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.init : { *(.rela.init) } + .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } + .rela.fini : { *(.rela.fini) } + .rela.rodata : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) } + .rela.data.rel.ro : { *(.rela.data.rel.ro .rela.data.rel.ro.* .rela.gnu.linkonce.d.rel.ro.*) } + .rela.data : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) } + .rela.tdata : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) } + .rela.tbss : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) } + .rela.ctors : { *(.rela.ctors) } + .rela.dtors : { *(.rela.dtors) } + .rela.got : { *(.rela.got) } + .rela.bss : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) } + .rela.ldata : { *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) } + .rela.lbss : { *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) } + .rela.lrodata : { *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) } + .rela.ifunc : { *(.rela.ifunc) } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xdc b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xdc new file mode 100644 index 0000000..b929298 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xdc @@ -0,0 +1,228 @@ +/* Script for -pie -z combreloc: position independent executable, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-x86-64", "elf32-x86-64", + "elf32-x86-64") +OUTPUT_ARCH(i386:x64-32) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/libx32"); SEARCH_DIR("=/usr/local/libx32"); SEARCH_DIR("=/libx32"); SEARCH_DIR("=/usr/libx32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0)); . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xdw b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xdw new file mode 100644 index 0000000..6561385 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xdw @@ -0,0 +1,227 @@ +/* Script for -pie -z combreloc -z now -z relro: position independent executable, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-x86-64", "elf32-x86-64", + "elf32-x86-64") +OUTPUT_ARCH(i386:x64-32) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/libx32"); SEARCH_DIR("=/usr/local/libx32"); SEARCH_DIR("=/libx32"); SEARCH_DIR("=/usr/libx32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0)); . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (0, .); + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xn b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xn new file mode 100644 index 0000000..905a86b --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xn @@ -0,0 +1,226 @@ +/* Script for -n: mix text and data on same page */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-x86-64", "elf32-x86-64", + "elf32-x86-64") +OUTPUT_ARCH(i386:x64-32) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/libx32"); SEARCH_DIR("=/usr/local/libx32"); SEARCH_DIR("=/libx32"); SEARCH_DIR("=/usr/libx32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.init : { *(.rela.init) } + .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } + .rela.fini : { *(.rela.fini) } + .rela.rodata : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) } + .rela.data.rel.ro : { *(.rela.data.rel.ro .rela.data.rel.ro.* .rela.gnu.linkonce.d.rel.ro.*) } + .rela.data : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) } + .rela.tdata : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) } + .rela.tbss : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) } + .rela.ctors : { *(.rela.ctors) } + .rela.dtors : { *(.rela.dtors) } + .rela.got : { *(.rela.got) } + .rela.bss : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) } + .rela.ldata : { *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) } + .rela.lbss : { *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) } + .rela.lrodata : { *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) } + .rela.ifunc : { *(.rela.ifunc) } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xr b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xr new file mode 100644 index 0000000..9587f81 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xr @@ -0,0 +1,158 @@ +/* Script for ld -r: link without relocation */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-x86-64", "elf32-x86-64", + "elf32-x86-64") +OUTPUT_ARCH(i386:x64-32) + /* For some reason, the Solaris linker makes bad executables + if gld -r is used and the intermediate file has sections starting + at non-zero addresses. Could be a Solaris ld bug, could be a GNU ld + bug. But for now assigning the zero vmas works. */ +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + .interp 0 : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash 0 : { *(.hash) } + .gnu.hash 0 : { *(.gnu.hash) } + .dynsym 0 : { *(.dynsym) } + .dynstr 0 : { *(.dynstr) } + .gnu.version 0 : { *(.gnu.version) } + .gnu.version_d 0: { *(.gnu.version_d) } + .gnu.version_r 0: { *(.gnu.version_r) } + .rela.init 0 : { *(.rela.init) } + .rela.text 0 : { *(.rela.text) } + .rela.fini 0 : { *(.rela.fini) } + .rela.rodata 0 : { *(.rela.rodata) } + .rela.data.rel.ro 0 : { *(.rela.data.rel.ro) } + .rela.data 0 : { *(.rela.data) } + .rela.tdata 0 : { *(.rela.tdata) } + .rela.tbss 0 : { *(.rela.tbss) } + .rela.ctors 0 : { *(.rela.ctors) } + .rela.dtors 0 : { *(.rela.dtors) } + .rela.got 0 : { *(.rela.got) } + .rela.bss 0 : { *(.rela.bss) } + .rela.ldata 0 : { *(.rela.ldata) } + .rela.lbss 0 : { *(.rela.lbss) } + .rela.lrodata 0 : { *(.rela.lrodata) } + .rela.ifunc 0 : { *(.rela.ifunc) } + .rela.plt 0 : + { + *(.rela.plt) + } + .init 0 : + { + KEEP (*(SORT_NONE(.init))) + } + .plt 0 : { *(.plt) *(.iplt) } + .text 0 : + { + *(.text .stub) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini 0 : + { + KEEP (*(SORT_NONE(.fini))) + } + .rodata 0 : { *(.rodata) } + .rodata1 0 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) } + .eh_frame 0 : ONLY_IF_RO { KEEP (*(.eh_frame)) } + .gcc_except_table 0 : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab 0 : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges 0 : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + /* Exception handling */ + .eh_frame 0 : ONLY_IF_RW { KEEP (*(.eh_frame)) } + .gnu_extab 0 : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table 0 : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges 0 : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata 0 : { *(.tdata) } + .tbss 0 : { *(.tbss) } + .preinit_array 0 : + { + KEEP (*(.preinit_array)) + } + .jcr 0 : { KEEP (*(.jcr)) } + .dynamic 0 : { *(.dynamic) } + .got 0 : { *(.got) *(.igot) } + .got.plt 0 : { *(.got.plt) *(.igot.plt) } + .data 0 : + { + *(.data) + } + .data1 0 : { *(.data1) } + .bss 0 : + { + *(.dynbss) + *(.bss) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + } + .lbss 0 : + { + *(.dynlbss) + *(.lbss) + *(LARGE_COMMON) + } + .lrodata 0 : + { + *(.lrodata) + } + .ldata 0 : + { + *(.ldata) + } + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xs b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xs new file mode 100644 index 0000000..8c97290 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xs @@ -0,0 +1,217 @@ +/* Script for ld --shared: link shared library */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-x86-64", "elf32-x86-64", + "elf32-x86-64") +OUTPUT_ARCH(i386:x64-32) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/libx32"); SEARCH_DIR("=/usr/local/libx32"); SEARCH_DIR("=/libx32"); SEARCH_DIR("=/usr/libx32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.init : { *(.rela.init) } + .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } + .rela.fini : { *(.rela.fini) } + .rela.rodata : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) } + .rela.data.rel.ro : { *(.rela.data.rel.ro .rela.data.rel.ro.* .rela.gnu.linkonce.d.rel.ro.*) } + .rela.data : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) } + .rela.tdata : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) } + .rela.tbss : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) } + .rela.ctors : { *(.rela.ctors) } + .rela.dtors : { *(.rela.dtors) } + .rela.got : { *(.rela.got) } + .rela.bss : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) } + .rela.ldata : { *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) } + .rela.lbss : { *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) } + .rela.lrodata : { *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) } + .rela.ifunc : { *(.rela.ifunc) } + .rela.plt : + { + *(.rela.plt) + *(.rela.iplt) + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + KEEP (*(.preinit_array)) + } + .init_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + } + .fini_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xsc b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xsc new file mode 100644 index 0000000..8300aa6 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xsc @@ -0,0 +1,219 @@ +/* Script for --shared -z combreloc: shared library, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-x86-64", "elf32-x86-64", + "elf32-x86-64") +OUTPUT_ARCH(i386:x64-32) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/libx32"); SEARCH_DIR("=/usr/local/libx32"); SEARCH_DIR("=/libx32"); SEARCH_DIR("=/usr/libx32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + *(.rela.iplt) + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + KEEP (*(.preinit_array)) + } + .init_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + } + .fini_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xsw b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xsw new file mode 100644 index 0000000..3138941 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xsw @@ -0,0 +1,218 @@ +/* Script for --shared -z combreloc -z now -z relro: shared library, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-x86-64", "elf32-x86-64", + "elf32-x86-64") +OUTPUT_ARCH(i386:x64-32) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/libx32"); SEARCH_DIR("=/usr/local/libx32"); SEARCH_DIR("=/libx32"); SEARCH_DIR("=/usr/libx32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + *(.rela.iplt) + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + KEEP (*(.preinit_array)) + } + .init_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + } + .fini_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (0, .); + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xu b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xu new file mode 100644 index 0000000..d40f030 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xu @@ -0,0 +1,159 @@ +/* Script for ld -Ur: link w/out relocation, do create constructors */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-x86-64", "elf32-x86-64", + "elf32-x86-64") +OUTPUT_ARCH(i386:x64-32) + /* For some reason, the Solaris linker makes bad executables + if gld -r is used and the intermediate file has sections starting + at non-zero addresses. Could be a Solaris ld bug, could be a GNU ld + bug. But for now assigning the zero vmas works. */ +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + .interp 0 : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash 0 : { *(.hash) } + .gnu.hash 0 : { *(.gnu.hash) } + .dynsym 0 : { *(.dynsym) } + .dynstr 0 : { *(.dynstr) } + .gnu.version 0 : { *(.gnu.version) } + .gnu.version_d 0: { *(.gnu.version_d) } + .gnu.version_r 0: { *(.gnu.version_r) } + .rela.init 0 : { *(.rela.init) } + .rela.text 0 : { *(.rela.text) } + .rela.fini 0 : { *(.rela.fini) } + .rela.rodata 0 : { *(.rela.rodata) } + .rela.data.rel.ro 0 : { *(.rela.data.rel.ro) } + .rela.data 0 : { *(.rela.data) } + .rela.tdata 0 : { *(.rela.tdata) } + .rela.tbss 0 : { *(.rela.tbss) } + .rela.ctors 0 : { *(.rela.ctors) } + .rela.dtors 0 : { *(.rela.dtors) } + .rela.got 0 : { *(.rela.got) } + .rela.bss 0 : { *(.rela.bss) } + .rela.ldata 0 : { *(.rela.ldata) } + .rela.lbss 0 : { *(.rela.lbss) } + .rela.lrodata 0 : { *(.rela.lrodata) } + .rela.ifunc 0 : { *(.rela.ifunc) } + .rela.plt 0 : + { + *(.rela.plt) + } + .init 0 : + { + KEEP (*(SORT_NONE(.init))) + } + .plt 0 : { *(.plt) *(.iplt) } + .text 0 : + { + *(.text .stub) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini 0 : + { + KEEP (*(SORT_NONE(.fini))) + } + .rodata 0 : { *(.rodata) } + .rodata1 0 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) } + .eh_frame 0 : ONLY_IF_RO { KEEP (*(.eh_frame)) } + .gcc_except_table 0 : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab 0 : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges 0 : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + /* Exception handling */ + .eh_frame 0 : ONLY_IF_RW { KEEP (*(.eh_frame)) } + .gnu_extab 0 : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table 0 : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges 0 : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata 0 : { *(.tdata) } + .tbss 0 : { *(.tbss) } + .preinit_array 0 : + { + KEEP (*(.preinit_array)) + } + .jcr 0 : { KEEP (*(.jcr)) } + .dynamic 0 : { *(.dynamic) } + .got 0 : { *(.got) *(.igot) } + .got.plt 0 : { *(.got.plt) *(.igot.plt) } + .data 0 : + { + *(.data) + SORT(CONSTRUCTORS) + } + .data1 0 : { *(.data1) } + .bss 0 : + { + *(.dynbss) + *(.bss) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + } + .lbss 0 : + { + *(.dynlbss) + *(.lbss) + *(LARGE_COMMON) + } + .lrodata 0 : + { + *(.lrodata) + } + .ldata 0 : + { + *(.ldata) + } + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xw b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xw new file mode 100644 index 0000000..33d5f76 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf32_x86_64.xw @@ -0,0 +1,227 @@ +/* Script for -z combreloc -z now -z relro: combine and sort reloc sections */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-x86-64", "elf32-x86-64", + "elf32-x86-64") +OUTPUT_ARCH(i386:x64-32) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/libx32"); SEARCH_DIR("=/usr/local/libx32"); SEARCH_DIR("=/libx32"); SEARCH_DIR("=/usr/libx32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (0, .); + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_i386.x b/x86_64-linux-android/lib/ldscripts/elf_i386.x new file mode 100644 index 0000000..ac56d99 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_i386.x @@ -0,0 +1,209 @@ +/* Default linker script, for normal executables */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-i386", "elf32-i386", + "elf32-i386") +OUTPUT_ARCH(i386) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib32"); SEARCH_DIR("=/usr/local/lib32"); SEARCH_DIR("=/lib32"); SEARCH_DIR("=/usr/lib32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x08048000)); . = SEGMENT_START("text-segment", 0x08048000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.init : { *(.rel.init) } + .rel.text : { *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) } + .rel.fini : { *(.rel.fini) } + .rel.rodata : { *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) } + .rel.data.rel.ro : { *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) } + .rel.data : { *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) } + .rel.tdata : { *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) } + .rel.tbss : { *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) } + .rel.ctors : { *(.rel.ctors) } + .rel.dtors : { *(.rel.dtors) } + .rel.got : { *(.rel.got) } + .rel.bss : { *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) } + .rel.ifunc : { *(.rel.ifunc) } + .rel.plt : + { + *(.rel.plt) + PROVIDE_HIDDEN (__rel_iplt_start = .); + *(.rel.iplt) + PROVIDE_HIDDEN (__rel_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 12 ? 12 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_i386.xbn b/x86_64-linux-android/lib/ldscripts/elf_i386.xbn new file mode 100644 index 0000000..bc09d01 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_i386.xbn @@ -0,0 +1,207 @@ +/* Script for -N: mix text and data on same page; don't align data */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-i386", "elf32-i386", + "elf32-i386") +OUTPUT_ARCH(i386) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib32"); SEARCH_DIR("=/usr/local/lib32"); SEARCH_DIR("=/lib32"); SEARCH_DIR("=/usr/lib32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x08048000)); . = SEGMENT_START("text-segment", 0x08048000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.init : { *(.rel.init) } + .rel.text : { *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) } + .rel.fini : { *(.rel.fini) } + .rel.rodata : { *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) } + .rel.data.rel.ro : { *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) } + .rel.data : { *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) } + .rel.tdata : { *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) } + .rel.tbss : { *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) } + .rel.ctors : { *(.rel.ctors) } + .rel.dtors : { *(.rel.dtors) } + .rel.got : { *(.rel.got) } + .rel.bss : { *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) } + .rel.ifunc : { *(.rel.ifunc) } + .rel.plt : + { + *(.rel.plt) + PROVIDE_HIDDEN (__rel_iplt_start = .); + *(.rel.iplt) + PROVIDE_HIDDEN (__rel_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = .; + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_i386.xc b/x86_64-linux-android/lib/ldscripts/elf_i386.xc new file mode 100644 index 0000000..55f68f6 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_i386.xc @@ -0,0 +1,212 @@ +/* Script for -z combreloc: combine and sort reloc sections */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-i386", "elf32-i386", + "elf32-i386") +OUTPUT_ARCH(i386) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib32"); SEARCH_DIR("=/usr/local/lib32"); SEARCH_DIR("=/lib32"); SEARCH_DIR("=/usr/lib32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x08048000)); . = SEGMENT_START("text-segment", 0x08048000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.dyn : + { + *(.rel.init) + *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) + *(.rel.fini) + *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) + *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) + *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) + *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) + *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) + *(.rel.ctors) + *(.rel.dtors) + *(.rel.got) + *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) + *(.rel.ifunc) + } + .rel.plt : + { + *(.rel.plt) + PROVIDE_HIDDEN (__rel_iplt_start = .); + *(.rel.iplt) + PROVIDE_HIDDEN (__rel_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 12 ? 12 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_i386.xd b/x86_64-linux-android/lib/ldscripts/elf_i386.xd new file mode 100644 index 0000000..5694a1f --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_i386.xd @@ -0,0 +1,209 @@ +/* Script for ld -pie: link position independent executable */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-i386", "elf32-i386", + "elf32-i386") +OUTPUT_ARCH(i386) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib32"); SEARCH_DIR("=/usr/local/lib32"); SEARCH_DIR("=/lib32"); SEARCH_DIR("=/usr/lib32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0)); . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.init : { *(.rel.init) } + .rel.text : { *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) } + .rel.fini : { *(.rel.fini) } + .rel.rodata : { *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) } + .rel.data.rel.ro : { *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) } + .rel.data : { *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) } + .rel.tdata : { *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) } + .rel.tbss : { *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) } + .rel.ctors : { *(.rel.ctors) } + .rel.dtors : { *(.rel.dtors) } + .rel.got : { *(.rel.got) } + .rel.bss : { *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) } + .rel.ifunc : { *(.rel.ifunc) } + .rel.plt : + { + *(.rel.plt) + PROVIDE_HIDDEN (__rel_iplt_start = .); + *(.rel.iplt) + PROVIDE_HIDDEN (__rel_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 12 ? 12 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_i386.xdc b/x86_64-linux-android/lib/ldscripts/elf_i386.xdc new file mode 100644 index 0000000..943737e --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_i386.xdc @@ -0,0 +1,212 @@ +/* Script for -pie -z combreloc: position independent executable, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-i386", "elf32-i386", + "elf32-i386") +OUTPUT_ARCH(i386) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib32"); SEARCH_DIR("=/usr/local/lib32"); SEARCH_DIR("=/lib32"); SEARCH_DIR("=/usr/lib32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0)); . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.dyn : + { + *(.rel.init) + *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) + *(.rel.fini) + *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) + *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) + *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) + *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) + *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) + *(.rel.ctors) + *(.rel.dtors) + *(.rel.got) + *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) + *(.rel.ifunc) + } + .rel.plt : + { + *(.rel.plt) + PROVIDE_HIDDEN (__rel_iplt_start = .); + *(.rel.iplt) + PROVIDE_HIDDEN (__rel_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 12 ? 12 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_i386.xdw b/x86_64-linux-android/lib/ldscripts/elf_i386.xdw new file mode 100644 index 0000000..7ed3f67 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_i386.xdw @@ -0,0 +1,211 @@ +/* Script for -pie -z combreloc -z now -z relro: position independent executable, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-i386", "elf32-i386", + "elf32-i386") +OUTPUT_ARCH(i386) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib32"); SEARCH_DIR("=/usr/local/lib32"); SEARCH_DIR("=/lib32"); SEARCH_DIR("=/usr/lib32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0)); . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.dyn : + { + *(.rel.init) + *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) + *(.rel.fini) + *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) + *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) + *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) + *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) + *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) + *(.rel.ctors) + *(.rel.dtors) + *(.rel.got) + *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) + *(.rel.ifunc) + } + .rel.plt : + { + *(.rel.plt) + PROVIDE_HIDDEN (__rel_iplt_start = .); + *(.rel.iplt) + PROVIDE_HIDDEN (__rel_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (0, .); + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_i386.xn b/x86_64-linux-android/lib/ldscripts/elf_i386.xn new file mode 100644 index 0000000..14f164e --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_i386.xn @@ -0,0 +1,209 @@ +/* Script for -n: mix text and data on same page */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-i386", "elf32-i386", + "elf32-i386") +OUTPUT_ARCH(i386) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib32"); SEARCH_DIR("=/usr/local/lib32"); SEARCH_DIR("=/lib32"); SEARCH_DIR("=/usr/lib32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x08048000)); . = SEGMENT_START("text-segment", 0x08048000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.init : { *(.rel.init) } + .rel.text : { *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) } + .rel.fini : { *(.rel.fini) } + .rel.rodata : { *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) } + .rel.data.rel.ro : { *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) } + .rel.data : { *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) } + .rel.tdata : { *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) } + .rel.tbss : { *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) } + .rel.ctors : { *(.rel.ctors) } + .rel.dtors : { *(.rel.dtors) } + .rel.got : { *(.rel.got) } + .rel.bss : { *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) } + .rel.ifunc : { *(.rel.ifunc) } + .rel.plt : + { + *(.rel.plt) + PROVIDE_HIDDEN (__rel_iplt_start = .); + *(.rel.iplt) + PROVIDE_HIDDEN (__rel_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 12 ? 12 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_i386.xr b/x86_64-linux-android/lib/ldscripts/elf_i386.xr new file mode 100644 index 0000000..27ee751 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_i386.xr @@ -0,0 +1,142 @@ +/* Script for ld -r: link without relocation */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-i386", "elf32-i386", + "elf32-i386") +OUTPUT_ARCH(i386) + /* For some reason, the Solaris linker makes bad executables + if gld -r is used and the intermediate file has sections starting + at non-zero addresses. Could be a Solaris ld bug, could be a GNU ld + bug. But for now assigning the zero vmas works. */ +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + .interp 0 : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash 0 : { *(.hash) } + .gnu.hash 0 : { *(.gnu.hash) } + .dynsym 0 : { *(.dynsym) } + .dynstr 0 : { *(.dynstr) } + .gnu.version 0 : { *(.gnu.version) } + .gnu.version_d 0: { *(.gnu.version_d) } + .gnu.version_r 0: { *(.gnu.version_r) } + .rel.init 0 : { *(.rel.init) } + .rel.text 0 : { *(.rel.text) } + .rel.fini 0 : { *(.rel.fini) } + .rel.rodata 0 : { *(.rel.rodata) } + .rel.data.rel.ro 0 : { *(.rel.data.rel.ro) } + .rel.data 0 : { *(.rel.data) } + .rel.tdata 0 : { *(.rel.tdata) } + .rel.tbss 0 : { *(.rel.tbss) } + .rel.ctors 0 : { *(.rel.ctors) } + .rel.dtors 0 : { *(.rel.dtors) } + .rel.got 0 : { *(.rel.got) } + .rel.bss 0 : { *(.rel.bss) } + .rel.ifunc 0 : { *(.rel.ifunc) } + .rel.plt 0 : + { + *(.rel.plt) + } + .init 0 : + { + KEEP (*(SORT_NONE(.init))) + } + .plt 0 : { *(.plt) *(.iplt) } +.plt.got 0 : { *(.plt.got) } + .text 0 : + { + *(.text .stub) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini 0 : + { + KEEP (*(SORT_NONE(.fini))) + } + .rodata 0 : { *(.rodata) } + .rodata1 0 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) } + .eh_frame 0 : ONLY_IF_RO { KEEP (*(.eh_frame)) } + .gcc_except_table 0 : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab 0 : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges 0 : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + /* Exception handling */ + .eh_frame 0 : ONLY_IF_RW { KEEP (*(.eh_frame)) } + .gnu_extab 0 : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table 0 : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges 0 : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata 0 : { *(.tdata) } + .tbss 0 : { *(.tbss) } + .preinit_array 0 : + { + KEEP (*(.preinit_array)) + } + .jcr 0 : { KEEP (*(.jcr)) } + .dynamic 0 : { *(.dynamic) } + .got 0 : { *(.got) *(.igot) } + .got.plt 0 : { *(.got.plt) *(.igot.plt) } + .data 0 : + { + *(.data) + } + .data1 0 : { *(.data1) } + .bss 0 : + { + *(.dynbss) + *(.bss) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + } + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_i386.xs b/x86_64-linux-android/lib/ldscripts/elf_i386.xs new file mode 100644 index 0000000..0dc1f2a --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_i386.xs @@ -0,0 +1,200 @@ +/* Script for ld --shared: link shared library */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-i386", "elf32-i386", + "elf32-i386") +OUTPUT_ARCH(i386) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib32"); SEARCH_DIR("=/usr/local/lib32"); SEARCH_DIR("=/lib32"); SEARCH_DIR("=/usr/lib32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.init : { *(.rel.init) } + .rel.text : { *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) } + .rel.fini : { *(.rel.fini) } + .rel.rodata : { *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) } + .rel.data.rel.ro : { *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) } + .rel.data : { *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) } + .rel.tdata : { *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) } + .rel.tbss : { *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) } + .rel.ctors : { *(.rel.ctors) } + .rel.dtors : { *(.rel.dtors) } + .rel.got : { *(.rel.got) } + .rel.bss : { *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) } + .rel.ifunc : { *(.rel.ifunc) } + .rel.plt : + { + *(.rel.plt) + *(.rel.iplt) + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + KEEP (*(.preinit_array)) + } + .init_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + } + .fini_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 12 ? 12 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_i386.xsc b/x86_64-linux-android/lib/ldscripts/elf_i386.xsc new file mode 100644 index 0000000..2102bce --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_i386.xsc @@ -0,0 +1,203 @@ +/* Script for --shared -z combreloc: shared library, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-i386", "elf32-i386", + "elf32-i386") +OUTPUT_ARCH(i386) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib32"); SEARCH_DIR("=/usr/local/lib32"); SEARCH_DIR("=/lib32"); SEARCH_DIR("=/usr/lib32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.dyn : + { + *(.rel.init) + *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) + *(.rel.fini) + *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) + *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) + *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) + *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) + *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) + *(.rel.ctors) + *(.rel.dtors) + *(.rel.got) + *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) + *(.rel.ifunc) + } + .rel.plt : + { + *(.rel.plt) + *(.rel.iplt) + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + KEEP (*(.preinit_array)) + } + .init_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + } + .fini_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 12 ? 12 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_i386.xsw b/x86_64-linux-android/lib/ldscripts/elf_i386.xsw new file mode 100644 index 0000000..215506d --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_i386.xsw @@ -0,0 +1,202 @@ +/* Script for --shared -z combreloc -z now -z relro: shared library, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-i386", "elf32-i386", + "elf32-i386") +OUTPUT_ARCH(i386) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib32"); SEARCH_DIR("=/usr/local/lib32"); SEARCH_DIR("=/lib32"); SEARCH_DIR("=/usr/lib32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.dyn : + { + *(.rel.init) + *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) + *(.rel.fini) + *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) + *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) + *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) + *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) + *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) + *(.rel.ctors) + *(.rel.dtors) + *(.rel.got) + *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) + *(.rel.ifunc) + } + .rel.plt : + { + *(.rel.plt) + *(.rel.iplt) + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + KEEP (*(.preinit_array)) + } + .init_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + } + .fini_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (0, .); + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_i386.xu b/x86_64-linux-android/lib/ldscripts/elf_i386.xu new file mode 100644 index 0000000..444593d --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_i386.xu @@ -0,0 +1,143 @@ +/* Script for ld -Ur: link w/out relocation, do create constructors */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-i386", "elf32-i386", + "elf32-i386") +OUTPUT_ARCH(i386) + /* For some reason, the Solaris linker makes bad executables + if gld -r is used and the intermediate file has sections starting + at non-zero addresses. Could be a Solaris ld bug, could be a GNU ld + bug. But for now assigning the zero vmas works. */ +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + .interp 0 : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash 0 : { *(.hash) } + .gnu.hash 0 : { *(.gnu.hash) } + .dynsym 0 : { *(.dynsym) } + .dynstr 0 : { *(.dynstr) } + .gnu.version 0 : { *(.gnu.version) } + .gnu.version_d 0: { *(.gnu.version_d) } + .gnu.version_r 0: { *(.gnu.version_r) } + .rel.init 0 : { *(.rel.init) } + .rel.text 0 : { *(.rel.text) } + .rel.fini 0 : { *(.rel.fini) } + .rel.rodata 0 : { *(.rel.rodata) } + .rel.data.rel.ro 0 : { *(.rel.data.rel.ro) } + .rel.data 0 : { *(.rel.data) } + .rel.tdata 0 : { *(.rel.tdata) } + .rel.tbss 0 : { *(.rel.tbss) } + .rel.ctors 0 : { *(.rel.ctors) } + .rel.dtors 0 : { *(.rel.dtors) } + .rel.got 0 : { *(.rel.got) } + .rel.bss 0 : { *(.rel.bss) } + .rel.ifunc 0 : { *(.rel.ifunc) } + .rel.plt 0 : + { + *(.rel.plt) + } + .init 0 : + { + KEEP (*(SORT_NONE(.init))) + } + .plt 0 : { *(.plt) *(.iplt) } +.plt.got 0 : { *(.plt.got) } + .text 0 : + { + *(.text .stub) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini 0 : + { + KEEP (*(SORT_NONE(.fini))) + } + .rodata 0 : { *(.rodata) } + .rodata1 0 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) } + .eh_frame 0 : ONLY_IF_RO { KEEP (*(.eh_frame)) } + .gcc_except_table 0 : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab 0 : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges 0 : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + /* Exception handling */ + .eh_frame 0 : ONLY_IF_RW { KEEP (*(.eh_frame)) } + .gnu_extab 0 : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table 0 : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges 0 : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata 0 : { *(.tdata) } + .tbss 0 : { *(.tbss) } + .preinit_array 0 : + { + KEEP (*(.preinit_array)) + } + .jcr 0 : { KEEP (*(.jcr)) } + .dynamic 0 : { *(.dynamic) } + .got 0 : { *(.got) *(.igot) } + .got.plt 0 : { *(.got.plt) *(.igot.plt) } + .data 0 : + { + *(.data) + SORT(CONSTRUCTORS) + } + .data1 0 : { *(.data1) } + .bss 0 : + { + *(.dynbss) + *(.bss) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + } + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_i386.xw b/x86_64-linux-android/lib/ldscripts/elf_i386.xw new file mode 100644 index 0000000..0b99256 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_i386.xw @@ -0,0 +1,211 @@ +/* Script for -z combreloc -z now -z relro: combine and sort reloc sections */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-i386", "elf32-i386", + "elf32-i386") +OUTPUT_ARCH(i386) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib32"); SEARCH_DIR("=/usr/local/lib32"); SEARCH_DIR("=/lib32"); SEARCH_DIR("=/usr/lib32"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x08048000)); . = SEGMENT_START("text-segment", 0x08048000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.dyn : + { + *(.rel.init) + *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) + *(.rel.fini) + *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) + *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) + *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) + *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) + *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) + *(.rel.ctors) + *(.rel.dtors) + *(.rel.got) + *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) + *(.rel.ifunc) + } + .rel.plt : + { + *(.rel.plt) + PROVIDE_HIDDEN (__rel_iplt_start = .); + *(.rel.iplt) + PROVIDE_HIDDEN (__rel_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (0, .); + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_iamcu.x b/x86_64-linux-android/lib/ldscripts/elf_iamcu.x new file mode 100644 index 0000000..416efd2 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_iamcu.x @@ -0,0 +1,209 @@ +/* Default linker script, for normal executables */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-iamcu", "elf32-iamcu", + "elf32-iamcu") +OUTPUT_ARCH(iamcu) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x08048000)); . = SEGMENT_START("text-segment", 0x08048000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.init : { *(.rel.init) } + .rel.text : { *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) } + .rel.fini : { *(.rel.fini) } + .rel.rodata : { *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) } + .rel.data.rel.ro : { *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) } + .rel.data : { *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) } + .rel.tdata : { *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) } + .rel.tbss : { *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) } + .rel.ctors : { *(.rel.ctors) } + .rel.dtors : { *(.rel.dtors) } + .rel.got : { *(.rel.got) } + .rel.bss : { *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) } + .rel.ifunc : { *(.rel.ifunc) } + .rel.plt : + { + *(.rel.plt) + PROVIDE_HIDDEN (__rel_iplt_start = .); + *(.rel.iplt) + PROVIDE_HIDDEN (__rel_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 12 ? 12 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_iamcu.xbn b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xbn new file mode 100644 index 0000000..4844a4b --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xbn @@ -0,0 +1,207 @@ +/* Script for -N: mix text and data on same page; don't align data */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-iamcu", "elf32-iamcu", + "elf32-iamcu") +OUTPUT_ARCH(iamcu) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x08048000)); . = SEGMENT_START("text-segment", 0x08048000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.init : { *(.rel.init) } + .rel.text : { *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) } + .rel.fini : { *(.rel.fini) } + .rel.rodata : { *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) } + .rel.data.rel.ro : { *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) } + .rel.data : { *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) } + .rel.tdata : { *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) } + .rel.tbss : { *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) } + .rel.ctors : { *(.rel.ctors) } + .rel.dtors : { *(.rel.dtors) } + .rel.got : { *(.rel.got) } + .rel.bss : { *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) } + .rel.ifunc : { *(.rel.ifunc) } + .rel.plt : + { + *(.rel.plt) + PROVIDE_HIDDEN (__rel_iplt_start = .); + *(.rel.iplt) + PROVIDE_HIDDEN (__rel_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = .; + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_iamcu.xc b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xc new file mode 100644 index 0000000..efae53c --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xc @@ -0,0 +1,212 @@ +/* Script for -z combreloc: combine and sort reloc sections */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-iamcu", "elf32-iamcu", + "elf32-iamcu") +OUTPUT_ARCH(iamcu) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x08048000)); . = SEGMENT_START("text-segment", 0x08048000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.dyn : + { + *(.rel.init) + *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) + *(.rel.fini) + *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) + *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) + *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) + *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) + *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) + *(.rel.ctors) + *(.rel.dtors) + *(.rel.got) + *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) + *(.rel.ifunc) + } + .rel.plt : + { + *(.rel.plt) + PROVIDE_HIDDEN (__rel_iplt_start = .); + *(.rel.iplt) + PROVIDE_HIDDEN (__rel_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 12 ? 12 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_iamcu.xd b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xd new file mode 100644 index 0000000..8a37081 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xd @@ -0,0 +1,209 @@ +/* Script for ld -pie: link position independent executable */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-iamcu", "elf32-iamcu", + "elf32-iamcu") +OUTPUT_ARCH(iamcu) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0)); . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.init : { *(.rel.init) } + .rel.text : { *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) } + .rel.fini : { *(.rel.fini) } + .rel.rodata : { *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) } + .rel.data.rel.ro : { *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) } + .rel.data : { *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) } + .rel.tdata : { *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) } + .rel.tbss : { *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) } + .rel.ctors : { *(.rel.ctors) } + .rel.dtors : { *(.rel.dtors) } + .rel.got : { *(.rel.got) } + .rel.bss : { *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) } + .rel.ifunc : { *(.rel.ifunc) } + .rel.plt : + { + *(.rel.plt) + PROVIDE_HIDDEN (__rel_iplt_start = .); + *(.rel.iplt) + PROVIDE_HIDDEN (__rel_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 12 ? 12 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_iamcu.xdc b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xdc new file mode 100644 index 0000000..41de031 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xdc @@ -0,0 +1,212 @@ +/* Script for -pie -z combreloc: position independent executable, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-iamcu", "elf32-iamcu", + "elf32-iamcu") +OUTPUT_ARCH(iamcu) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0)); . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.dyn : + { + *(.rel.init) + *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) + *(.rel.fini) + *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) + *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) + *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) + *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) + *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) + *(.rel.ctors) + *(.rel.dtors) + *(.rel.got) + *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) + *(.rel.ifunc) + } + .rel.plt : + { + *(.rel.plt) + PROVIDE_HIDDEN (__rel_iplt_start = .); + *(.rel.iplt) + PROVIDE_HIDDEN (__rel_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 12 ? 12 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_iamcu.xdw b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xdw new file mode 100644 index 0000000..ef54bcc --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xdw @@ -0,0 +1,211 @@ +/* Script for -pie -z combreloc -z now -z relro: position independent executable, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-iamcu", "elf32-iamcu", + "elf32-iamcu") +OUTPUT_ARCH(iamcu) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0)); . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.dyn : + { + *(.rel.init) + *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) + *(.rel.fini) + *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) + *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) + *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) + *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) + *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) + *(.rel.ctors) + *(.rel.dtors) + *(.rel.got) + *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) + *(.rel.ifunc) + } + .rel.plt : + { + *(.rel.plt) + PROVIDE_HIDDEN (__rel_iplt_start = .); + *(.rel.iplt) + PROVIDE_HIDDEN (__rel_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (0, .); + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_iamcu.xn b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xn new file mode 100644 index 0000000..272b560 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xn @@ -0,0 +1,209 @@ +/* Script for -n: mix text and data on same page */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-iamcu", "elf32-iamcu", + "elf32-iamcu") +OUTPUT_ARCH(iamcu) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x08048000)); . = SEGMENT_START("text-segment", 0x08048000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.init : { *(.rel.init) } + .rel.text : { *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) } + .rel.fini : { *(.rel.fini) } + .rel.rodata : { *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) } + .rel.data.rel.ro : { *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) } + .rel.data : { *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) } + .rel.tdata : { *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) } + .rel.tbss : { *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) } + .rel.ctors : { *(.rel.ctors) } + .rel.dtors : { *(.rel.dtors) } + .rel.got : { *(.rel.got) } + .rel.bss : { *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) } + .rel.ifunc : { *(.rel.ifunc) } + .rel.plt : + { + *(.rel.plt) + PROVIDE_HIDDEN (__rel_iplt_start = .); + *(.rel.iplt) + PROVIDE_HIDDEN (__rel_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 12 ? 12 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_iamcu.xr b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xr new file mode 100644 index 0000000..1cd71fa --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xr @@ -0,0 +1,142 @@ +/* Script for ld -r: link without relocation */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-iamcu", "elf32-iamcu", + "elf32-iamcu") +OUTPUT_ARCH(iamcu) + /* For some reason, the Solaris linker makes bad executables + if gld -r is used and the intermediate file has sections starting + at non-zero addresses. Could be a Solaris ld bug, could be a GNU ld + bug. But for now assigning the zero vmas works. */ +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + .interp 0 : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash 0 : { *(.hash) } + .gnu.hash 0 : { *(.gnu.hash) } + .dynsym 0 : { *(.dynsym) } + .dynstr 0 : { *(.dynstr) } + .gnu.version 0 : { *(.gnu.version) } + .gnu.version_d 0: { *(.gnu.version_d) } + .gnu.version_r 0: { *(.gnu.version_r) } + .rel.init 0 : { *(.rel.init) } + .rel.text 0 : { *(.rel.text) } + .rel.fini 0 : { *(.rel.fini) } + .rel.rodata 0 : { *(.rel.rodata) } + .rel.data.rel.ro 0 : { *(.rel.data.rel.ro) } + .rel.data 0 : { *(.rel.data) } + .rel.tdata 0 : { *(.rel.tdata) } + .rel.tbss 0 : { *(.rel.tbss) } + .rel.ctors 0 : { *(.rel.ctors) } + .rel.dtors 0 : { *(.rel.dtors) } + .rel.got 0 : { *(.rel.got) } + .rel.bss 0 : { *(.rel.bss) } + .rel.ifunc 0 : { *(.rel.ifunc) } + .rel.plt 0 : + { + *(.rel.plt) + } + .init 0 : + { + KEEP (*(SORT_NONE(.init))) + } + .plt 0 : { *(.plt) *(.iplt) } +.plt.got 0 : { *(.plt.got) } + .text 0 : + { + *(.text .stub) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini 0 : + { + KEEP (*(SORT_NONE(.fini))) + } + .rodata 0 : { *(.rodata) } + .rodata1 0 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) } + .eh_frame 0 : ONLY_IF_RO { KEEP (*(.eh_frame)) } + .gcc_except_table 0 : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab 0 : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges 0 : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + /* Exception handling */ + .eh_frame 0 : ONLY_IF_RW { KEEP (*(.eh_frame)) } + .gnu_extab 0 : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table 0 : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges 0 : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata 0 : { *(.tdata) } + .tbss 0 : { *(.tbss) } + .preinit_array 0 : + { + KEEP (*(.preinit_array)) + } + .jcr 0 : { KEEP (*(.jcr)) } + .dynamic 0 : { *(.dynamic) } + .got 0 : { *(.got) *(.igot) } + .got.plt 0 : { *(.got.plt) *(.igot.plt) } + .data 0 : + { + *(.data) + } + .data1 0 : { *(.data1) } + .bss 0 : + { + *(.dynbss) + *(.bss) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + } + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_iamcu.xs b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xs new file mode 100644 index 0000000..3b566e7 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xs @@ -0,0 +1,200 @@ +/* Script for ld --shared: link shared library */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-iamcu", "elf32-iamcu", + "elf32-iamcu") +OUTPUT_ARCH(iamcu) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.init : { *(.rel.init) } + .rel.text : { *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) } + .rel.fini : { *(.rel.fini) } + .rel.rodata : { *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) } + .rel.data.rel.ro : { *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) } + .rel.data : { *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) } + .rel.tdata : { *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) } + .rel.tbss : { *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) } + .rel.ctors : { *(.rel.ctors) } + .rel.dtors : { *(.rel.dtors) } + .rel.got : { *(.rel.got) } + .rel.bss : { *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) } + .rel.ifunc : { *(.rel.ifunc) } + .rel.plt : + { + *(.rel.plt) + *(.rel.iplt) + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + KEEP (*(.preinit_array)) + } + .init_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + } + .fini_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 12 ? 12 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_iamcu.xsc b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xsc new file mode 100644 index 0000000..b3e2b17 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xsc @@ -0,0 +1,203 @@ +/* Script for --shared -z combreloc: shared library, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-iamcu", "elf32-iamcu", + "elf32-iamcu") +OUTPUT_ARCH(iamcu) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.dyn : + { + *(.rel.init) + *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) + *(.rel.fini) + *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) + *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) + *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) + *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) + *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) + *(.rel.ctors) + *(.rel.dtors) + *(.rel.got) + *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) + *(.rel.ifunc) + } + .rel.plt : + { + *(.rel.plt) + *(.rel.iplt) + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + KEEP (*(.preinit_array)) + } + .init_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + } + .fini_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 12 ? 12 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_iamcu.xsw b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xsw new file mode 100644 index 0000000..66d9f2a --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xsw @@ -0,0 +1,202 @@ +/* Script for --shared -z combreloc -z now -z relro: shared library, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-iamcu", "elf32-iamcu", + "elf32-iamcu") +OUTPUT_ARCH(iamcu) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.dyn : + { + *(.rel.init) + *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) + *(.rel.fini) + *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) + *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) + *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) + *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) + *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) + *(.rel.ctors) + *(.rel.dtors) + *(.rel.got) + *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) + *(.rel.ifunc) + } + .rel.plt : + { + *(.rel.plt) + *(.rel.iplt) + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + KEEP (*(.preinit_array)) + } + .init_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + } + .fini_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (0, .); + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_iamcu.xu b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xu new file mode 100644 index 0000000..4259493 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xu @@ -0,0 +1,143 @@ +/* Script for ld -Ur: link w/out relocation, do create constructors */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-iamcu", "elf32-iamcu", + "elf32-iamcu") +OUTPUT_ARCH(iamcu) + /* For some reason, the Solaris linker makes bad executables + if gld -r is used and the intermediate file has sections starting + at non-zero addresses. Could be a Solaris ld bug, could be a GNU ld + bug. But for now assigning the zero vmas works. */ +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + .interp 0 : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash 0 : { *(.hash) } + .gnu.hash 0 : { *(.gnu.hash) } + .dynsym 0 : { *(.dynsym) } + .dynstr 0 : { *(.dynstr) } + .gnu.version 0 : { *(.gnu.version) } + .gnu.version_d 0: { *(.gnu.version_d) } + .gnu.version_r 0: { *(.gnu.version_r) } + .rel.init 0 : { *(.rel.init) } + .rel.text 0 : { *(.rel.text) } + .rel.fini 0 : { *(.rel.fini) } + .rel.rodata 0 : { *(.rel.rodata) } + .rel.data.rel.ro 0 : { *(.rel.data.rel.ro) } + .rel.data 0 : { *(.rel.data) } + .rel.tdata 0 : { *(.rel.tdata) } + .rel.tbss 0 : { *(.rel.tbss) } + .rel.ctors 0 : { *(.rel.ctors) } + .rel.dtors 0 : { *(.rel.dtors) } + .rel.got 0 : { *(.rel.got) } + .rel.bss 0 : { *(.rel.bss) } + .rel.ifunc 0 : { *(.rel.ifunc) } + .rel.plt 0 : + { + *(.rel.plt) + } + .init 0 : + { + KEEP (*(SORT_NONE(.init))) + } + .plt 0 : { *(.plt) *(.iplt) } +.plt.got 0 : { *(.plt.got) } + .text 0 : + { + *(.text .stub) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini 0 : + { + KEEP (*(SORT_NONE(.fini))) + } + .rodata 0 : { *(.rodata) } + .rodata1 0 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) } + .eh_frame 0 : ONLY_IF_RO { KEEP (*(.eh_frame)) } + .gcc_except_table 0 : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab 0 : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges 0 : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + /* Exception handling */ + .eh_frame 0 : ONLY_IF_RW { KEEP (*(.eh_frame)) } + .gnu_extab 0 : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table 0 : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges 0 : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata 0 : { *(.tdata) } + .tbss 0 : { *(.tbss) } + .preinit_array 0 : + { + KEEP (*(.preinit_array)) + } + .jcr 0 : { KEEP (*(.jcr)) } + .dynamic 0 : { *(.dynamic) } + .got 0 : { *(.got) *(.igot) } + .got.plt 0 : { *(.got.plt) *(.igot.plt) } + .data 0 : + { + *(.data) + SORT(CONSTRUCTORS) + } + .data1 0 : { *(.data1) } + .bss 0 : + { + *(.dynbss) + *(.bss) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + } + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_iamcu.xw b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xw new file mode 100644 index 0000000..53ac972 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_iamcu.xw @@ -0,0 +1,211 @@ +/* Script for -z combreloc -z now -z relro: combine and sort reloc sections */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf32-iamcu", "elf32-iamcu", + "elf32-iamcu") +OUTPUT_ARCH(iamcu) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x08048000)); . = SEGMENT_START("text-segment", 0x08048000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rel.dyn : + { + *(.rel.init) + *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) + *(.rel.fini) + *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) + *(.rel.data.rel.ro .rel.data.rel.ro.* .rel.gnu.linkonce.d.rel.ro.*) + *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) + *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) + *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) + *(.rel.ctors) + *(.rel.dtors) + *(.rel.got) + *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) + *(.rel.ifunc) + } + .rel.plt : + { + *(.rel.plt) + PROVIDE_HIDDEN (__rel_iplt_start = .); + *(.rel.iplt) + PROVIDE_HIDDEN (__rel_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (0, .); + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 32 / 8 : 1); + } + . = ALIGN(32 / 8); + . = SEGMENT_START("ldata-segment", .); + . = ALIGN(32 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_k1om.x b/x86_64-linux-android/lib/ldscripts/elf_k1om.x new file mode 100644 index 0000000..7f491c0 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_k1om.x @@ -0,0 +1,226 @@ +/* Default linker script, for normal executables */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-k1om", "elf64-k1om", + "elf64-k1om") +OUTPUT_ARCH(k1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.init : { *(.rela.init) } + .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } + .rela.fini : { *(.rela.fini) } + .rela.rodata : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) } + .rela.data.rel.ro : { *(.rela.data.rel.ro .rela.data.rel.ro.* .rela.gnu.linkonce.d.rel.ro.*) } + .rela.data : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) } + .rela.tdata : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) } + .rela.tbss : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) } + .rela.ctors : { *(.rela.ctors) } + .rela.dtors : { *(.rela.dtors) } + .rela.got : { *(.rela.got) } + .rela.bss : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) } + .rela.ldata : { *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) } + .rela.lbss : { *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) } + .rela.lrodata : { *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) } + .rela.ifunc : { *(.rela.ifunc) } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_k1om.xbn b/x86_64-linux-android/lib/ldscripts/elf_k1om.xbn new file mode 100644 index 0000000..a774873 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_k1om.xbn @@ -0,0 +1,224 @@ +/* Script for -N: mix text and data on same page; don't align data */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-k1om", "elf64-k1om", + "elf64-k1om") +OUTPUT_ARCH(k1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.init : { *(.rela.init) } + .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } + .rela.fini : { *(.rela.fini) } + .rela.rodata : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) } + .rela.data.rel.ro : { *(.rela.data.rel.ro .rela.data.rel.ro.* .rela.gnu.linkonce.d.rel.ro.*) } + .rela.data : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) } + .rela.tdata : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) } + .rela.tbss : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) } + .rela.ctors : { *(.rela.ctors) } + .rela.dtors : { *(.rela.dtors) } + .rela.got : { *(.rela.got) } + .rela.bss : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) } + .rela.ldata : { *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) } + .rela.lbss : { *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) } + .rela.lrodata : { *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) } + .rela.ifunc : { *(.rela.ifunc) } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = .; + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_k1om.xc b/x86_64-linux-android/lib/ldscripts/elf_k1om.xc new file mode 100644 index 0000000..0ab4f37 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_k1om.xc @@ -0,0 +1,228 @@ +/* Script for -z combreloc: combine and sort reloc sections */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-k1om", "elf64-k1om", + "elf64-k1om") +OUTPUT_ARCH(k1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_k1om.xd b/x86_64-linux-android/lib/ldscripts/elf_k1om.xd new file mode 100644 index 0000000..5271e52 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_k1om.xd @@ -0,0 +1,226 @@ +/* Script for ld -pie: link position independent executable */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-k1om", "elf64-k1om", + "elf64-k1om") +OUTPUT_ARCH(k1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0)); . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.init : { *(.rela.init) } + .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } + .rela.fini : { *(.rela.fini) } + .rela.rodata : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) } + .rela.data.rel.ro : { *(.rela.data.rel.ro .rela.data.rel.ro.* .rela.gnu.linkonce.d.rel.ro.*) } + .rela.data : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) } + .rela.tdata : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) } + .rela.tbss : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) } + .rela.ctors : { *(.rela.ctors) } + .rela.dtors : { *(.rela.dtors) } + .rela.got : { *(.rela.got) } + .rela.bss : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) } + .rela.ldata : { *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) } + .rela.lbss : { *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) } + .rela.lrodata : { *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) } + .rela.ifunc : { *(.rela.ifunc) } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_k1om.xdc b/x86_64-linux-android/lib/ldscripts/elf_k1om.xdc new file mode 100644 index 0000000..92ab303 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_k1om.xdc @@ -0,0 +1,228 @@ +/* Script for -pie -z combreloc: position independent executable, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-k1om", "elf64-k1om", + "elf64-k1om") +OUTPUT_ARCH(k1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0)); . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_k1om.xdw b/x86_64-linux-android/lib/ldscripts/elf_k1om.xdw new file mode 100644 index 0000000..c30d2db --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_k1om.xdw @@ -0,0 +1,227 @@ +/* Script for -pie -z combreloc -z now -z relro: position independent executable, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-k1om", "elf64-k1om", + "elf64-k1om") +OUTPUT_ARCH(k1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0)); . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (0, .); + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_k1om.xn b/x86_64-linux-android/lib/ldscripts/elf_k1om.xn new file mode 100644 index 0000000..774e6c7 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_k1om.xn @@ -0,0 +1,226 @@ +/* Script for -n: mix text and data on same page */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-k1om", "elf64-k1om", + "elf64-k1om") +OUTPUT_ARCH(k1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.init : { *(.rela.init) } + .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } + .rela.fini : { *(.rela.fini) } + .rela.rodata : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) } + .rela.data.rel.ro : { *(.rela.data.rel.ro .rela.data.rel.ro.* .rela.gnu.linkonce.d.rel.ro.*) } + .rela.data : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) } + .rela.tdata : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) } + .rela.tbss : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) } + .rela.ctors : { *(.rela.ctors) } + .rela.dtors : { *(.rela.dtors) } + .rela.got : { *(.rela.got) } + .rela.bss : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) } + .rela.ldata : { *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) } + .rela.lbss : { *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) } + .rela.lrodata : { *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) } + .rela.ifunc : { *(.rela.ifunc) } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_k1om.xr b/x86_64-linux-android/lib/ldscripts/elf_k1om.xr new file mode 100644 index 0000000..38e4dee --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_k1om.xr @@ -0,0 +1,158 @@ +/* Script for ld -r: link without relocation */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-k1om", "elf64-k1om", + "elf64-k1om") +OUTPUT_ARCH(k1om) + /* For some reason, the Solaris linker makes bad executables + if gld -r is used and the intermediate file has sections starting + at non-zero addresses. Could be a Solaris ld bug, could be a GNU ld + bug. But for now assigning the zero vmas works. */ +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + .interp 0 : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash 0 : { *(.hash) } + .gnu.hash 0 : { *(.gnu.hash) } + .dynsym 0 : { *(.dynsym) } + .dynstr 0 : { *(.dynstr) } + .gnu.version 0 : { *(.gnu.version) } + .gnu.version_d 0: { *(.gnu.version_d) } + .gnu.version_r 0: { *(.gnu.version_r) } + .rela.init 0 : { *(.rela.init) } + .rela.text 0 : { *(.rela.text) } + .rela.fini 0 : { *(.rela.fini) } + .rela.rodata 0 : { *(.rela.rodata) } + .rela.data.rel.ro 0 : { *(.rela.data.rel.ro) } + .rela.data 0 : { *(.rela.data) } + .rela.tdata 0 : { *(.rela.tdata) } + .rela.tbss 0 : { *(.rela.tbss) } + .rela.ctors 0 : { *(.rela.ctors) } + .rela.dtors 0 : { *(.rela.dtors) } + .rela.got 0 : { *(.rela.got) } + .rela.bss 0 : { *(.rela.bss) } + .rela.ldata 0 : { *(.rela.ldata) } + .rela.lbss 0 : { *(.rela.lbss) } + .rela.lrodata 0 : { *(.rela.lrodata) } + .rela.ifunc 0 : { *(.rela.ifunc) } + .rela.plt 0 : + { + *(.rela.plt) + } + .init 0 : + { + KEEP (*(SORT_NONE(.init))) + } + .plt 0 : { *(.plt) *(.iplt) } + .text 0 : + { + *(.text .stub) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini 0 : + { + KEEP (*(SORT_NONE(.fini))) + } + .rodata 0 : { *(.rodata) } + .rodata1 0 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) } + .eh_frame 0 : ONLY_IF_RO { KEEP (*(.eh_frame)) } + .gcc_except_table 0 : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab 0 : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges 0 : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + /* Exception handling */ + .eh_frame 0 : ONLY_IF_RW { KEEP (*(.eh_frame)) } + .gnu_extab 0 : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table 0 : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges 0 : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata 0 : { *(.tdata) } + .tbss 0 : { *(.tbss) } + .preinit_array 0 : + { + KEEP (*(.preinit_array)) + } + .jcr 0 : { KEEP (*(.jcr)) } + .dynamic 0 : { *(.dynamic) } + .got 0 : { *(.got) *(.igot) } + .got.plt 0 : { *(.got.plt) *(.igot.plt) } + .data 0 : + { + *(.data) + } + .data1 0 : { *(.data1) } + .bss 0 : + { + *(.dynbss) + *(.bss) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + } + .lbss 0 : + { + *(.dynlbss) + *(.lbss) + *(LARGE_COMMON) + } + .lrodata 0 : + { + *(.lrodata) + } + .ldata 0 : + { + *(.ldata) + } + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_k1om.xs b/x86_64-linux-android/lib/ldscripts/elf_k1om.xs new file mode 100644 index 0000000..58964d8 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_k1om.xs @@ -0,0 +1,217 @@ +/* Script for ld --shared: link shared library */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-k1om", "elf64-k1om", + "elf64-k1om") +OUTPUT_ARCH(k1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.init : { *(.rela.init) } + .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } + .rela.fini : { *(.rela.fini) } + .rela.rodata : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) } + .rela.data.rel.ro : { *(.rela.data.rel.ro .rela.data.rel.ro.* .rela.gnu.linkonce.d.rel.ro.*) } + .rela.data : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) } + .rela.tdata : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) } + .rela.tbss : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) } + .rela.ctors : { *(.rela.ctors) } + .rela.dtors : { *(.rela.dtors) } + .rela.got : { *(.rela.got) } + .rela.bss : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) } + .rela.ldata : { *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) } + .rela.lbss : { *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) } + .rela.lrodata : { *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) } + .rela.ifunc : { *(.rela.ifunc) } + .rela.plt : + { + *(.rela.plt) + *(.rela.iplt) + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + KEEP (*(.preinit_array)) + } + .init_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + } + .fini_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_k1om.xsc b/x86_64-linux-android/lib/ldscripts/elf_k1om.xsc new file mode 100644 index 0000000..f1f3412 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_k1om.xsc @@ -0,0 +1,219 @@ +/* Script for --shared -z combreloc: shared library, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-k1om", "elf64-k1om", + "elf64-k1om") +OUTPUT_ARCH(k1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + *(.rela.iplt) + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + KEEP (*(.preinit_array)) + } + .init_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + } + .fini_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_k1om.xsw b/x86_64-linux-android/lib/ldscripts/elf_k1om.xsw new file mode 100644 index 0000000..ddc8556 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_k1om.xsw @@ -0,0 +1,218 @@ +/* Script for --shared -z combreloc -z now -z relro: shared library, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-k1om", "elf64-k1om", + "elf64-k1om") +OUTPUT_ARCH(k1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + *(.rela.iplt) + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + KEEP (*(.preinit_array)) + } + .init_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + } + .fini_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (0, .); + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_k1om.xu b/x86_64-linux-android/lib/ldscripts/elf_k1om.xu new file mode 100644 index 0000000..8bded2e --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_k1om.xu @@ -0,0 +1,159 @@ +/* Script for ld -Ur: link w/out relocation, do create constructors */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-k1om", "elf64-k1om", + "elf64-k1om") +OUTPUT_ARCH(k1om) + /* For some reason, the Solaris linker makes bad executables + if gld -r is used and the intermediate file has sections starting + at non-zero addresses. Could be a Solaris ld bug, could be a GNU ld + bug. But for now assigning the zero vmas works. */ +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + .interp 0 : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash 0 : { *(.hash) } + .gnu.hash 0 : { *(.gnu.hash) } + .dynsym 0 : { *(.dynsym) } + .dynstr 0 : { *(.dynstr) } + .gnu.version 0 : { *(.gnu.version) } + .gnu.version_d 0: { *(.gnu.version_d) } + .gnu.version_r 0: { *(.gnu.version_r) } + .rela.init 0 : { *(.rela.init) } + .rela.text 0 : { *(.rela.text) } + .rela.fini 0 : { *(.rela.fini) } + .rela.rodata 0 : { *(.rela.rodata) } + .rela.data.rel.ro 0 : { *(.rela.data.rel.ro) } + .rela.data 0 : { *(.rela.data) } + .rela.tdata 0 : { *(.rela.tdata) } + .rela.tbss 0 : { *(.rela.tbss) } + .rela.ctors 0 : { *(.rela.ctors) } + .rela.dtors 0 : { *(.rela.dtors) } + .rela.got 0 : { *(.rela.got) } + .rela.bss 0 : { *(.rela.bss) } + .rela.ldata 0 : { *(.rela.ldata) } + .rela.lbss 0 : { *(.rela.lbss) } + .rela.lrodata 0 : { *(.rela.lrodata) } + .rela.ifunc 0 : { *(.rela.ifunc) } + .rela.plt 0 : + { + *(.rela.plt) + } + .init 0 : + { + KEEP (*(SORT_NONE(.init))) + } + .plt 0 : { *(.plt) *(.iplt) } + .text 0 : + { + *(.text .stub) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini 0 : + { + KEEP (*(SORT_NONE(.fini))) + } + .rodata 0 : { *(.rodata) } + .rodata1 0 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) } + .eh_frame 0 : ONLY_IF_RO { KEEP (*(.eh_frame)) } + .gcc_except_table 0 : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab 0 : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges 0 : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + /* Exception handling */ + .eh_frame 0 : ONLY_IF_RW { KEEP (*(.eh_frame)) } + .gnu_extab 0 : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table 0 : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges 0 : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata 0 : { *(.tdata) } + .tbss 0 : { *(.tbss) } + .preinit_array 0 : + { + KEEP (*(.preinit_array)) + } + .jcr 0 : { KEEP (*(.jcr)) } + .dynamic 0 : { *(.dynamic) } + .got 0 : { *(.got) *(.igot) } + .got.plt 0 : { *(.got.plt) *(.igot.plt) } + .data 0 : + { + *(.data) + SORT(CONSTRUCTORS) + } + .data1 0 : { *(.data1) } + .bss 0 : + { + *(.dynbss) + *(.bss) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + } + .lbss 0 : + { + *(.dynlbss) + *(.lbss) + *(LARGE_COMMON) + } + .lrodata 0 : + { + *(.lrodata) + } + .ldata 0 : + { + *(.ldata) + } + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_k1om.xw b/x86_64-linux-android/lib/ldscripts/elf_k1om.xw new file mode 100644 index 0000000..3b58a3f --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_k1om.xw @@ -0,0 +1,227 @@ +/* Script for -z combreloc -z now -z relro: combine and sort reloc sections */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-k1om", "elf64-k1om", + "elf64-k1om") +OUTPUT_ARCH(k1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (0, .); + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_l1om.x b/x86_64-linux-android/lib/ldscripts/elf_l1om.x new file mode 100644 index 0000000..4c5ed73 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_l1om.x @@ -0,0 +1,226 @@ +/* Default linker script, for normal executables */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-l1om", "elf64-l1om", + "elf64-l1om") +OUTPUT_ARCH(l1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.init : { *(.rela.init) } + .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } + .rela.fini : { *(.rela.fini) } + .rela.rodata : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) } + .rela.data.rel.ro : { *(.rela.data.rel.ro .rela.data.rel.ro.* .rela.gnu.linkonce.d.rel.ro.*) } + .rela.data : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) } + .rela.tdata : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) } + .rela.tbss : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) } + .rela.ctors : { *(.rela.ctors) } + .rela.dtors : { *(.rela.dtors) } + .rela.got : { *(.rela.got) } + .rela.bss : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) } + .rela.ldata : { *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) } + .rela.lbss : { *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) } + .rela.lrodata : { *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) } + .rela.ifunc : { *(.rela.ifunc) } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_l1om.xbn b/x86_64-linux-android/lib/ldscripts/elf_l1om.xbn new file mode 100644 index 0000000..2fb6ae4 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_l1om.xbn @@ -0,0 +1,224 @@ +/* Script for -N: mix text and data on same page; don't align data */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-l1om", "elf64-l1om", + "elf64-l1om") +OUTPUT_ARCH(l1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.init : { *(.rela.init) } + .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } + .rela.fini : { *(.rela.fini) } + .rela.rodata : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) } + .rela.data.rel.ro : { *(.rela.data.rel.ro .rela.data.rel.ro.* .rela.gnu.linkonce.d.rel.ro.*) } + .rela.data : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) } + .rela.tdata : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) } + .rela.tbss : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) } + .rela.ctors : { *(.rela.ctors) } + .rela.dtors : { *(.rela.dtors) } + .rela.got : { *(.rela.got) } + .rela.bss : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) } + .rela.ldata : { *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) } + .rela.lbss : { *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) } + .rela.lrodata : { *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) } + .rela.ifunc : { *(.rela.ifunc) } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = .; + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_l1om.xc b/x86_64-linux-android/lib/ldscripts/elf_l1om.xc new file mode 100644 index 0000000..33a4ad3 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_l1om.xc @@ -0,0 +1,228 @@ +/* Script for -z combreloc: combine and sort reloc sections */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-l1om", "elf64-l1om", + "elf64-l1om") +OUTPUT_ARCH(l1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_l1om.xd b/x86_64-linux-android/lib/ldscripts/elf_l1om.xd new file mode 100644 index 0000000..62d5387 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_l1om.xd @@ -0,0 +1,226 @@ +/* Script for ld -pie: link position independent executable */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-l1om", "elf64-l1om", + "elf64-l1om") +OUTPUT_ARCH(l1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0)); . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.init : { *(.rela.init) } + .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } + .rela.fini : { *(.rela.fini) } + .rela.rodata : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) } + .rela.data.rel.ro : { *(.rela.data.rel.ro .rela.data.rel.ro.* .rela.gnu.linkonce.d.rel.ro.*) } + .rela.data : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) } + .rela.tdata : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) } + .rela.tbss : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) } + .rela.ctors : { *(.rela.ctors) } + .rela.dtors : { *(.rela.dtors) } + .rela.got : { *(.rela.got) } + .rela.bss : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) } + .rela.ldata : { *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) } + .rela.lbss : { *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) } + .rela.lrodata : { *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) } + .rela.ifunc : { *(.rela.ifunc) } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_l1om.xdc b/x86_64-linux-android/lib/ldscripts/elf_l1om.xdc new file mode 100644 index 0000000..ecc58b6 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_l1om.xdc @@ -0,0 +1,228 @@ +/* Script for -pie -z combreloc: position independent executable, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-l1om", "elf64-l1om", + "elf64-l1om") +OUTPUT_ARCH(l1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0)); . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_l1om.xdw b/x86_64-linux-android/lib/ldscripts/elf_l1om.xdw new file mode 100644 index 0000000..5a3873f --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_l1om.xdw @@ -0,0 +1,227 @@ +/* Script for -pie -z combreloc -z now -z relro: position independent executable, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-l1om", "elf64-l1om", + "elf64-l1om") +OUTPUT_ARCH(l1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0)); . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (0, .); + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_l1om.xn b/x86_64-linux-android/lib/ldscripts/elf_l1om.xn new file mode 100644 index 0000000..a359d1a --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_l1om.xn @@ -0,0 +1,226 @@ +/* Script for -n: mix text and data on same page */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-l1om", "elf64-l1om", + "elf64-l1om") +OUTPUT_ARCH(l1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.init : { *(.rela.init) } + .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } + .rela.fini : { *(.rela.fini) } + .rela.rodata : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) } + .rela.data.rel.ro : { *(.rela.data.rel.ro .rela.data.rel.ro.* .rela.gnu.linkonce.d.rel.ro.*) } + .rela.data : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) } + .rela.tdata : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) } + .rela.tbss : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) } + .rela.ctors : { *(.rela.ctors) } + .rela.dtors : { *(.rela.dtors) } + .rela.got : { *(.rela.got) } + .rela.bss : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) } + .rela.ldata : { *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) } + .rela.lbss : { *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) } + .rela.lrodata : { *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) } + .rela.ifunc : { *(.rela.ifunc) } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_l1om.xr b/x86_64-linux-android/lib/ldscripts/elf_l1om.xr new file mode 100644 index 0000000..5529b8d --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_l1om.xr @@ -0,0 +1,158 @@ +/* Script for ld -r: link without relocation */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-l1om", "elf64-l1om", + "elf64-l1om") +OUTPUT_ARCH(l1om) + /* For some reason, the Solaris linker makes bad executables + if gld -r is used and the intermediate file has sections starting + at non-zero addresses. Could be a Solaris ld bug, could be a GNU ld + bug. But for now assigning the zero vmas works. */ +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + .interp 0 : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash 0 : { *(.hash) } + .gnu.hash 0 : { *(.gnu.hash) } + .dynsym 0 : { *(.dynsym) } + .dynstr 0 : { *(.dynstr) } + .gnu.version 0 : { *(.gnu.version) } + .gnu.version_d 0: { *(.gnu.version_d) } + .gnu.version_r 0: { *(.gnu.version_r) } + .rela.init 0 : { *(.rela.init) } + .rela.text 0 : { *(.rela.text) } + .rela.fini 0 : { *(.rela.fini) } + .rela.rodata 0 : { *(.rela.rodata) } + .rela.data.rel.ro 0 : { *(.rela.data.rel.ro) } + .rela.data 0 : { *(.rela.data) } + .rela.tdata 0 : { *(.rela.tdata) } + .rela.tbss 0 : { *(.rela.tbss) } + .rela.ctors 0 : { *(.rela.ctors) } + .rela.dtors 0 : { *(.rela.dtors) } + .rela.got 0 : { *(.rela.got) } + .rela.bss 0 : { *(.rela.bss) } + .rela.ldata 0 : { *(.rela.ldata) } + .rela.lbss 0 : { *(.rela.lbss) } + .rela.lrodata 0 : { *(.rela.lrodata) } + .rela.ifunc 0 : { *(.rela.ifunc) } + .rela.plt 0 : + { + *(.rela.plt) + } + .init 0 : + { + KEEP (*(SORT_NONE(.init))) + } + .plt 0 : { *(.plt) *(.iplt) } + .text 0 : + { + *(.text .stub) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini 0 : + { + KEEP (*(SORT_NONE(.fini))) + } + .rodata 0 : { *(.rodata) } + .rodata1 0 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) } + .eh_frame 0 : ONLY_IF_RO { KEEP (*(.eh_frame)) } + .gcc_except_table 0 : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab 0 : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges 0 : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + /* Exception handling */ + .eh_frame 0 : ONLY_IF_RW { KEEP (*(.eh_frame)) } + .gnu_extab 0 : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table 0 : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges 0 : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata 0 : { *(.tdata) } + .tbss 0 : { *(.tbss) } + .preinit_array 0 : + { + KEEP (*(.preinit_array)) + } + .jcr 0 : { KEEP (*(.jcr)) } + .dynamic 0 : { *(.dynamic) } + .got 0 : { *(.got) *(.igot) } + .got.plt 0 : { *(.got.plt) *(.igot.plt) } + .data 0 : + { + *(.data) + } + .data1 0 : { *(.data1) } + .bss 0 : + { + *(.dynbss) + *(.bss) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + } + .lbss 0 : + { + *(.dynlbss) + *(.lbss) + *(LARGE_COMMON) + } + .lrodata 0 : + { + *(.lrodata) + } + .ldata 0 : + { + *(.ldata) + } + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_l1om.xs b/x86_64-linux-android/lib/ldscripts/elf_l1om.xs new file mode 100644 index 0000000..254f4cb --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_l1om.xs @@ -0,0 +1,217 @@ +/* Script for ld --shared: link shared library */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-l1om", "elf64-l1om", + "elf64-l1om") +OUTPUT_ARCH(l1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.init : { *(.rela.init) } + .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } + .rela.fini : { *(.rela.fini) } + .rela.rodata : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) } + .rela.data.rel.ro : { *(.rela.data.rel.ro .rela.data.rel.ro.* .rela.gnu.linkonce.d.rel.ro.*) } + .rela.data : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) } + .rela.tdata : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) } + .rela.tbss : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) } + .rela.ctors : { *(.rela.ctors) } + .rela.dtors : { *(.rela.dtors) } + .rela.got : { *(.rela.got) } + .rela.bss : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) } + .rela.ldata : { *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) } + .rela.lbss : { *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) } + .rela.lrodata : { *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) } + .rela.ifunc : { *(.rela.ifunc) } + .rela.plt : + { + *(.rela.plt) + *(.rela.iplt) + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + KEEP (*(.preinit_array)) + } + .init_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + } + .fini_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_l1om.xsc b/x86_64-linux-android/lib/ldscripts/elf_l1om.xsc new file mode 100644 index 0000000..6f170ff --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_l1om.xsc @@ -0,0 +1,219 @@ +/* Script for --shared -z combreloc: shared library, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-l1om", "elf64-l1om", + "elf64-l1om") +OUTPUT_ARCH(l1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + *(.rela.iplt) + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + KEEP (*(.preinit_array)) + } + .init_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + } + .fini_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_l1om.xsw b/x86_64-linux-android/lib/ldscripts/elf_l1om.xsw new file mode 100644 index 0000000..73ca3c8 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_l1om.xsw @@ -0,0 +1,218 @@ +/* Script for --shared -z combreloc -z now -z relro: shared library, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-l1om", "elf64-l1om", + "elf64-l1om") +OUTPUT_ARCH(l1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + *(.rela.iplt) + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + KEEP (*(.preinit_array)) + } + .init_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + } + .fini_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (0, .); + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_l1om.xu b/x86_64-linux-android/lib/ldscripts/elf_l1om.xu new file mode 100644 index 0000000..5b3c1c5 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_l1om.xu @@ -0,0 +1,159 @@ +/* Script for ld -Ur: link w/out relocation, do create constructors */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-l1om", "elf64-l1om", + "elf64-l1om") +OUTPUT_ARCH(l1om) + /* For some reason, the Solaris linker makes bad executables + if gld -r is used and the intermediate file has sections starting + at non-zero addresses. Could be a Solaris ld bug, could be a GNU ld + bug. But for now assigning the zero vmas works. */ +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + .interp 0 : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash 0 : { *(.hash) } + .gnu.hash 0 : { *(.gnu.hash) } + .dynsym 0 : { *(.dynsym) } + .dynstr 0 : { *(.dynstr) } + .gnu.version 0 : { *(.gnu.version) } + .gnu.version_d 0: { *(.gnu.version_d) } + .gnu.version_r 0: { *(.gnu.version_r) } + .rela.init 0 : { *(.rela.init) } + .rela.text 0 : { *(.rela.text) } + .rela.fini 0 : { *(.rela.fini) } + .rela.rodata 0 : { *(.rela.rodata) } + .rela.data.rel.ro 0 : { *(.rela.data.rel.ro) } + .rela.data 0 : { *(.rela.data) } + .rela.tdata 0 : { *(.rela.tdata) } + .rela.tbss 0 : { *(.rela.tbss) } + .rela.ctors 0 : { *(.rela.ctors) } + .rela.dtors 0 : { *(.rela.dtors) } + .rela.got 0 : { *(.rela.got) } + .rela.bss 0 : { *(.rela.bss) } + .rela.ldata 0 : { *(.rela.ldata) } + .rela.lbss 0 : { *(.rela.lbss) } + .rela.lrodata 0 : { *(.rela.lrodata) } + .rela.ifunc 0 : { *(.rela.ifunc) } + .rela.plt 0 : + { + *(.rela.plt) + } + .init 0 : + { + KEEP (*(SORT_NONE(.init))) + } + .plt 0 : { *(.plt) *(.iplt) } + .text 0 : + { + *(.text .stub) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini 0 : + { + KEEP (*(SORT_NONE(.fini))) + } + .rodata 0 : { *(.rodata) } + .rodata1 0 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) } + .eh_frame 0 : ONLY_IF_RO { KEEP (*(.eh_frame)) } + .gcc_except_table 0 : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab 0 : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges 0 : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + /* Exception handling */ + .eh_frame 0 : ONLY_IF_RW { KEEP (*(.eh_frame)) } + .gnu_extab 0 : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table 0 : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges 0 : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata 0 : { *(.tdata) } + .tbss 0 : { *(.tbss) } + .preinit_array 0 : + { + KEEP (*(.preinit_array)) + } + .jcr 0 : { KEEP (*(.jcr)) } + .dynamic 0 : { *(.dynamic) } + .got 0 : { *(.got) *(.igot) } + .got.plt 0 : { *(.got.plt) *(.igot.plt) } + .data 0 : + { + *(.data) + SORT(CONSTRUCTORS) + } + .data1 0 : { *(.data1) } + .bss 0 : + { + *(.dynbss) + *(.bss) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + } + .lbss 0 : + { + *(.dynlbss) + *(.lbss) + *(LARGE_COMMON) + } + .lrodata 0 : + { + *(.lrodata) + } + .ldata 0 : + { + *(.ldata) + } + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_l1om.xw b/x86_64-linux-android/lib/ldscripts/elf_l1om.xw new file mode 100644 index 0000000..8829183 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_l1om.xw @@ -0,0 +1,227 @@ +/* Script for -z combreloc -z now -z relro: combine and sort reloc sections */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-l1om", "elf64-l1om", + "elf64-l1om") +OUTPUT_ARCH(l1om) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (0, .); + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_x86_64.x b/x86_64-linux-android/lib/ldscripts/elf_x86_64.x new file mode 100644 index 0000000..fe3dc9a --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_x86_64.x @@ -0,0 +1,228 @@ +/* Default linker script, for normal executables */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", + "elf64-x86-64") +OUTPUT_ARCH(i386:x86-64) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib64"); SEARCH_DIR("=/usr/local/lib64"); SEARCH_DIR("=/lib64"); SEARCH_DIR("=/usr/lib64"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.init : { *(.rela.init) } + .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } + .rela.fini : { *(.rela.fini) } + .rela.rodata : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) } + .rela.data.rel.ro : { *(.rela.data.rel.ro .rela.data.rel.ro.* .rela.gnu.linkonce.d.rel.ro.*) } + .rela.data : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) } + .rela.tdata : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) } + .rela.tbss : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) } + .rela.ctors : { *(.rela.ctors) } + .rela.dtors : { *(.rela.dtors) } + .rela.got : { *(.rela.got) } + .rela.bss : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) } + .rela.ldata : { *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) } + .rela.lbss : { *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) } + .rela.lrodata : { *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) } + .rela.ifunc : { *(.rela.ifunc) } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } +.plt.bnd : { *(.plt.bnd) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_x86_64.xbn b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xbn new file mode 100644 index 0000000..969f60e --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xbn @@ -0,0 +1,226 @@ +/* Script for -N: mix text and data on same page; don't align data */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", + "elf64-x86-64") +OUTPUT_ARCH(i386:x86-64) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib64"); SEARCH_DIR("=/usr/local/lib64"); SEARCH_DIR("=/lib64"); SEARCH_DIR("=/usr/lib64"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.init : { *(.rela.init) } + .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } + .rela.fini : { *(.rela.fini) } + .rela.rodata : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) } + .rela.data.rel.ro : { *(.rela.data.rel.ro .rela.data.rel.ro.* .rela.gnu.linkonce.d.rel.ro.*) } + .rela.data : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) } + .rela.tdata : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) } + .rela.tbss : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) } + .rela.ctors : { *(.rela.ctors) } + .rela.dtors : { *(.rela.dtors) } + .rela.got : { *(.rela.got) } + .rela.bss : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) } + .rela.ldata : { *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) } + .rela.lbss : { *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) } + .rela.lrodata : { *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) } + .rela.ifunc : { *(.rela.ifunc) } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } +.plt.bnd : { *(.plt.bnd) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = .; + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_x86_64.xc b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xc new file mode 100644 index 0000000..6362577 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xc @@ -0,0 +1,230 @@ +/* Script for -z combreloc: combine and sort reloc sections */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", + "elf64-x86-64") +OUTPUT_ARCH(i386:x86-64) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib64"); SEARCH_DIR("=/usr/local/lib64"); SEARCH_DIR("=/lib64"); SEARCH_DIR("=/usr/lib64"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } +.plt.bnd : { *(.plt.bnd) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_x86_64.xd b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xd new file mode 100644 index 0000000..1521d92 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xd @@ -0,0 +1,228 @@ +/* Script for ld -pie: link position independent executable */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", + "elf64-x86-64") +OUTPUT_ARCH(i386:x86-64) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib64"); SEARCH_DIR("=/usr/local/lib64"); SEARCH_DIR("=/lib64"); SEARCH_DIR("=/usr/lib64"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0)); . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.init : { *(.rela.init) } + .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } + .rela.fini : { *(.rela.fini) } + .rela.rodata : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) } + .rela.data.rel.ro : { *(.rela.data.rel.ro .rela.data.rel.ro.* .rela.gnu.linkonce.d.rel.ro.*) } + .rela.data : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) } + .rela.tdata : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) } + .rela.tbss : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) } + .rela.ctors : { *(.rela.ctors) } + .rela.dtors : { *(.rela.dtors) } + .rela.got : { *(.rela.got) } + .rela.bss : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) } + .rela.ldata : { *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) } + .rela.lbss : { *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) } + .rela.lrodata : { *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) } + .rela.ifunc : { *(.rela.ifunc) } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } +.plt.bnd : { *(.plt.bnd) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_x86_64.xdc b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xdc new file mode 100644 index 0000000..d380ea3 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xdc @@ -0,0 +1,230 @@ +/* Script for -pie -z combreloc: position independent executable, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", + "elf64-x86-64") +OUTPUT_ARCH(i386:x86-64) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib64"); SEARCH_DIR("=/usr/local/lib64"); SEARCH_DIR("=/lib64"); SEARCH_DIR("=/usr/lib64"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0)); . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } +.plt.bnd : { *(.plt.bnd) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_x86_64.xdw b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xdw new file mode 100644 index 0000000..228ee5f --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xdw @@ -0,0 +1,229 @@ +/* Script for -pie -z combreloc -z now -z relro: position independent executable, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", + "elf64-x86-64") +OUTPUT_ARCH(i386:x86-64) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib64"); SEARCH_DIR("=/usr/local/lib64"); SEARCH_DIR("=/lib64"); SEARCH_DIR("=/usr/lib64"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0)); . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } +.plt.bnd : { *(.plt.bnd) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (0, .); + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_x86_64.xn b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xn new file mode 100644 index 0000000..d4fda29 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xn @@ -0,0 +1,228 @@ +/* Script for -n: mix text and data on same page */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", + "elf64-x86-64") +OUTPUT_ARCH(i386:x86-64) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib64"); SEARCH_DIR("=/usr/local/lib64"); SEARCH_DIR("=/lib64"); SEARCH_DIR("=/usr/lib64"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.init : { *(.rela.init) } + .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } + .rela.fini : { *(.rela.fini) } + .rela.rodata : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) } + .rela.data.rel.ro : { *(.rela.data.rel.ro .rela.data.rel.ro.* .rela.gnu.linkonce.d.rel.ro.*) } + .rela.data : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) } + .rela.tdata : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) } + .rela.tbss : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) } + .rela.ctors : { *(.rela.ctors) } + .rela.dtors : { *(.rela.dtors) } + .rela.got : { *(.rela.got) } + .rela.bss : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) } + .rela.ldata : { *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) } + .rela.lbss : { *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) } + .rela.lrodata : { *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) } + .rela.ifunc : { *(.rela.ifunc) } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } +.plt.bnd : { *(.plt.bnd) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_x86_64.xr b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xr new file mode 100644 index 0000000..04f3953 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xr @@ -0,0 +1,160 @@ +/* Script for ld -r: link without relocation */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", + "elf64-x86-64") +OUTPUT_ARCH(i386:x86-64) + /* For some reason, the Solaris linker makes bad executables + if gld -r is used and the intermediate file has sections starting + at non-zero addresses. Could be a Solaris ld bug, could be a GNU ld + bug. But for now assigning the zero vmas works. */ +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + .interp 0 : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash 0 : { *(.hash) } + .gnu.hash 0 : { *(.gnu.hash) } + .dynsym 0 : { *(.dynsym) } + .dynstr 0 : { *(.dynstr) } + .gnu.version 0 : { *(.gnu.version) } + .gnu.version_d 0: { *(.gnu.version_d) } + .gnu.version_r 0: { *(.gnu.version_r) } + .rela.init 0 : { *(.rela.init) } + .rela.text 0 : { *(.rela.text) } + .rela.fini 0 : { *(.rela.fini) } + .rela.rodata 0 : { *(.rela.rodata) } + .rela.data.rel.ro 0 : { *(.rela.data.rel.ro) } + .rela.data 0 : { *(.rela.data) } + .rela.tdata 0 : { *(.rela.tdata) } + .rela.tbss 0 : { *(.rela.tbss) } + .rela.ctors 0 : { *(.rela.ctors) } + .rela.dtors 0 : { *(.rela.dtors) } + .rela.got 0 : { *(.rela.got) } + .rela.bss 0 : { *(.rela.bss) } + .rela.ldata 0 : { *(.rela.ldata) } + .rela.lbss 0 : { *(.rela.lbss) } + .rela.lrodata 0 : { *(.rela.lrodata) } + .rela.ifunc 0 : { *(.rela.ifunc) } + .rela.plt 0 : + { + *(.rela.plt) + } + .init 0 : + { + KEEP (*(SORT_NONE(.init))) + } + .plt 0 : { *(.plt) *(.iplt) } +.plt.got 0 : { *(.plt.got) } +.plt.bnd 0 : { *(.plt.bnd) } + .text 0 : + { + *(.text .stub) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini 0 : + { + KEEP (*(SORT_NONE(.fini))) + } + .rodata 0 : { *(.rodata) } + .rodata1 0 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) } + .eh_frame 0 : ONLY_IF_RO { KEEP (*(.eh_frame)) } + .gcc_except_table 0 : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab 0 : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges 0 : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + /* Exception handling */ + .eh_frame 0 : ONLY_IF_RW { KEEP (*(.eh_frame)) } + .gnu_extab 0 : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table 0 : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges 0 : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata 0 : { *(.tdata) } + .tbss 0 : { *(.tbss) } + .preinit_array 0 : + { + KEEP (*(.preinit_array)) + } + .jcr 0 : { KEEP (*(.jcr)) } + .dynamic 0 : { *(.dynamic) } + .got 0 : { *(.got) *(.igot) } + .got.plt 0 : { *(.got.plt) *(.igot.plt) } + .data 0 : + { + *(.data) + } + .data1 0 : { *(.data1) } + .bss 0 : + { + *(.dynbss) + *(.bss) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + } + .lbss 0 : + { + *(.dynlbss) + *(.lbss) + *(LARGE_COMMON) + } + .lrodata 0 : + { + *(.lrodata) + } + .ldata 0 : + { + *(.ldata) + } + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_x86_64.xs b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xs new file mode 100644 index 0000000..048362b --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xs @@ -0,0 +1,219 @@ +/* Script for ld --shared: link shared library */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", + "elf64-x86-64") +OUTPUT_ARCH(i386:x86-64) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib64"); SEARCH_DIR("=/usr/local/lib64"); SEARCH_DIR("=/lib64"); SEARCH_DIR("=/usr/lib64"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.init : { *(.rela.init) } + .rela.text : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) } + .rela.fini : { *(.rela.fini) } + .rela.rodata : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) } + .rela.data.rel.ro : { *(.rela.data.rel.ro .rela.data.rel.ro.* .rela.gnu.linkonce.d.rel.ro.*) } + .rela.data : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) } + .rela.tdata : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) } + .rela.tbss : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) } + .rela.ctors : { *(.rela.ctors) } + .rela.dtors : { *(.rela.dtors) } + .rela.got : { *(.rela.got) } + .rela.bss : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) } + .rela.ldata : { *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) } + .rela.lbss : { *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) } + .rela.lrodata : { *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) } + .rela.ifunc : { *(.rela.ifunc) } + .rela.plt : + { + *(.rela.plt) + *(.rela.iplt) + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } +.plt.bnd : { *(.plt.bnd) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + KEEP (*(.preinit_array)) + } + .init_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + } + .fini_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_x86_64.xsc b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xsc new file mode 100644 index 0000000..6a42898 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xsc @@ -0,0 +1,221 @@ +/* Script for --shared -z combreloc: shared library, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", + "elf64-x86-64") +OUTPUT_ARCH(i386:x86-64) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib64"); SEARCH_DIR("=/usr/local/lib64"); SEARCH_DIR("=/lib64"); SEARCH_DIR("=/usr/lib64"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + *(.rela.iplt) + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } +.plt.bnd : { *(.plt.bnd) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + KEEP (*(.preinit_array)) + } + .init_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + } + .fini_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .); + .got.plt : { *(.got.plt) *(.igot.plt) } + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_x86_64.xsw b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xsw new file mode 100644 index 0000000..3402ea0 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xsw @@ -0,0 +1,220 @@ +/* Script for --shared -z combreloc -z now -z relro: shared library, combine & sort relocs */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", + "elf64-x86-64") +OUTPUT_ARCH(i386:x86-64) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib64"); SEARCH_DIR("=/usr/local/lib64"); SEARCH_DIR("=/lib64"); SEARCH_DIR("=/usr/lib64"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + . = SEGMENT_START("text-segment", 0) + SIZEOF_HEADERS; + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + *(.rela.iplt) + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } +.plt.bnd : { *(.plt.bnd) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + KEEP (*(.preinit_array)) + } + .init_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + } + .fini_array : + { + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (0, .); + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_x86_64.xu b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xu new file mode 100644 index 0000000..0015b0e --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xu @@ -0,0 +1,161 @@ +/* Script for ld -Ur: link w/out relocation, do create constructors */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", + "elf64-x86-64") +OUTPUT_ARCH(i386:x86-64) + /* For some reason, the Solaris linker makes bad executables + if gld -r is used and the intermediate file has sections starting + at non-zero addresses. Could be a Solaris ld bug, could be a GNU ld + bug. But for now assigning the zero vmas works. */ +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + .interp 0 : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash 0 : { *(.hash) } + .gnu.hash 0 : { *(.gnu.hash) } + .dynsym 0 : { *(.dynsym) } + .dynstr 0 : { *(.dynstr) } + .gnu.version 0 : { *(.gnu.version) } + .gnu.version_d 0: { *(.gnu.version_d) } + .gnu.version_r 0: { *(.gnu.version_r) } + .rela.init 0 : { *(.rela.init) } + .rela.text 0 : { *(.rela.text) } + .rela.fini 0 : { *(.rela.fini) } + .rela.rodata 0 : { *(.rela.rodata) } + .rela.data.rel.ro 0 : { *(.rela.data.rel.ro) } + .rela.data 0 : { *(.rela.data) } + .rela.tdata 0 : { *(.rela.tdata) } + .rela.tbss 0 : { *(.rela.tbss) } + .rela.ctors 0 : { *(.rela.ctors) } + .rela.dtors 0 : { *(.rela.dtors) } + .rela.got 0 : { *(.rela.got) } + .rela.bss 0 : { *(.rela.bss) } + .rela.ldata 0 : { *(.rela.ldata) } + .rela.lbss 0 : { *(.rela.lbss) } + .rela.lrodata 0 : { *(.rela.lrodata) } + .rela.ifunc 0 : { *(.rela.ifunc) } + .rela.plt 0 : + { + *(.rela.plt) + } + .init 0 : + { + KEEP (*(SORT_NONE(.init))) + } + .plt 0 : { *(.plt) *(.iplt) } +.plt.got 0 : { *(.plt.got) } +.plt.bnd 0 : { *(.plt.bnd) } + .text 0 : + { + *(.text .stub) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini 0 : + { + KEEP (*(SORT_NONE(.fini))) + } + .rodata 0 : { *(.rodata) } + .rodata1 0 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) } + .eh_frame 0 : ONLY_IF_RO { KEEP (*(.eh_frame)) } + .gcc_except_table 0 : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab 0 : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges 0 : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + /* Exception handling */ + .eh_frame 0 : ONLY_IF_RW { KEEP (*(.eh_frame)) } + .gnu_extab 0 : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table 0 : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges 0 : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata 0 : { *(.tdata) } + .tbss 0 : { *(.tbss) } + .preinit_array 0 : + { + KEEP (*(.preinit_array)) + } + .jcr 0 : { KEEP (*(.jcr)) } + .dynamic 0 : { *(.dynamic) } + .got 0 : { *(.got) *(.igot) } + .got.plt 0 : { *(.got.plt) *(.igot.plt) } + .data 0 : + { + *(.data) + SORT(CONSTRUCTORS) + } + .data1 0 : { *(.data1) } + .bss 0 : + { + *(.dynbss) + *(.bss) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + } + .lbss 0 : + { + *(.dynlbss) + *(.lbss) + *(LARGE_COMMON) + } + .lrodata 0 : + { + *(.lrodata) + } + .ldata 0 : + { + *(.ldata) + } + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } +} diff --git a/x86_64-linux-android/lib/ldscripts/elf_x86_64.xw b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xw new file mode 100644 index 0000000..de1ecc2 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/elf_x86_64.xw @@ -0,0 +1,229 @@ +/* Script for -z combreloc -z now -z relro: combine and sort reloc sections */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64", + "elf64-x86-64") +OUTPUT_ARCH(i386:x86-64) +ENTRY(_start) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib64"); SEARCH_DIR("=/usr/local/lib64"); SEARCH_DIR("=/lib64"); SEARCH_DIR("=/usr/lib64"); SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/x86_64-linux-android/lib"); SEARCH_DIR("=/usr/local/lib"); SEARCH_DIR("=/lib"); SEARCH_DIR("=/usr/lib"); +SECTIONS +{ + /* Read-only sections, merged into text segment: */ + PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000)); . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS; + .interp : { *(.interp) } + .note.gnu.build-id : { *(.note.gnu.build-id) } + .hash : { *(.hash) } + .gnu.hash : { *(.gnu.hash) } + .dynsym : { *(.dynsym) } + .dynstr : { *(.dynstr) } + .gnu.version : { *(.gnu.version) } + .gnu.version_d : { *(.gnu.version_d) } + .gnu.version_r : { *(.gnu.version_r) } + .rela.dyn : + { + *(.rela.init) + *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) + *(.rela.fini) + *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) + *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) + *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) + *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) + *(.rela.ctors) + *(.rela.dtors) + *(.rela.got) + *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) + *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*) + *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*) + *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*) + *(.rela.ifunc) + } + .rela.plt : + { + *(.rela.plt) + PROVIDE_HIDDEN (__rela_iplt_start = .); + *(.rela.iplt) + PROVIDE_HIDDEN (__rela_iplt_end = .); + } + .init : + { + KEEP (*(SORT_NONE(.init))) + } + .plt : { *(.plt) *(.iplt) } +.plt.got : { *(.plt.got) } +.plt.bnd : { *(.plt.bnd) } + .text : + { + *(.text.unlikely .text.*_unlikely .text.unlikely.*) + *(.text.exit .text.exit.*) + *(.text.startup .text.startup.*) + *(.text.hot .text.hot.*) + *(.text .stub .text.* .gnu.linkonce.t.*) + /* .gnu.warning sections are handled specially by elf32.em. */ + *(.gnu.warning) + } + .fini : + { + KEEP (*(SORT_NONE(.fini))) + } + PROVIDE (__etext = .); + PROVIDE (_etext = .); + PROVIDE (etext = .); + .rodata : { *(.rodata .rodata.* .gnu.linkonce.r.*) } + .rodata1 : { *(.rodata1) } + .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) } + .eh_frame : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gcc_except_table : ONLY_IF_RO { *(.gcc_except_table + .gcc_except_table.*) } + .gnu_extab : ONLY_IF_RO { *(.gnu_extab*) } + /* These sections are generated by the Sun/Oracle C++ compiler. */ + .exception_ranges : ONLY_IF_RO { *(.exception_ranges + .exception_ranges*) } + /* Adjust the address for the data segment. We want to adjust up to + the same address within the page on the next page up. */ + . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE)); + /* Exception handling */ + .eh_frame : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) } + .gnu_extab : ONLY_IF_RW { *(.gnu_extab) } + .gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) } + .exception_ranges : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) } + /* Thread Local Storage sections */ + .tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) } + .tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) } + .preinit_array : + { + PROVIDE_HIDDEN (__preinit_array_start = .); + KEEP (*(.preinit_array)) + PROVIDE_HIDDEN (__preinit_array_end = .); + } + .init_array : + { + PROVIDE_HIDDEN (__init_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*))) + KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .ctors)) + PROVIDE_HIDDEN (__init_array_end = .); + } + .fini_array : + { + PROVIDE_HIDDEN (__fini_array_start = .); + KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*) SORT_BY_INIT_PRIORITY(.dtors.*))) + KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o *crtend?.o ) .dtors)) + PROVIDE_HIDDEN (__fini_array_end = .); + } + .ctors : + { + /* gcc uses crtbegin.o to find the start of + the constructors, so we make sure it is + first. Because this is a wildcard, it + doesn't matter if the user does not + actually link against crtbegin.o; the + linker won't look for a file to match a + wildcard. The wildcard also means that it + doesn't matter which directory crtbegin.o + is in. */ + KEEP (*crtbegin.o(.ctors)) + KEEP (*crtbegin?.o(.ctors)) + /* We don't want to include the .ctor section from + the crtend.o file until after the sorted ctors. + The .ctor section from the crtend file contains the + end of ctors marker and it must be last */ + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors)) + KEEP (*(SORT(.ctors.*))) + KEEP (*(.ctors)) + } + .dtors : + { + KEEP (*crtbegin.o(.dtors)) + KEEP (*crtbegin?.o(.dtors)) + KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors)) + KEEP (*(SORT(.dtors.*))) + KEEP (*(.dtors)) + } + .jcr : { KEEP (*(.jcr)) } + .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) } + .dynamic : { *(.dynamic) } + .got : { *(.got.plt) *(.igot.plt) *(.got) *(.igot) } + . = DATA_SEGMENT_RELRO_END (0, .); + .data : + { + *(.data .data.* .gnu.linkonce.d.*) + SORT(CONSTRUCTORS) + } + .data1 : { *(.data1) } + _edata = .; PROVIDE (edata = .); + __bss_start = .; + .bss : + { + *(.dynbss) + *(.bss .bss.* .gnu.linkonce.b.*) + *(COMMON) + /* Align here to ensure that the .bss section occupies space up to + _end. Align after .bss to ensure correct alignment even if the + .bss section disappears because there are no input sections. + FIXME: Why do we need it? When there is no .bss section, we don't + pad the .data section. */ + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + .lbss : + { + *(.dynlbss) + *(.lbss .lbss.* .gnu.linkonce.lb.*) + *(LARGE_COMMON) + } + . = ALIGN(64 / 8); + . = SEGMENT_START("ldata-segment", .); + .lrodata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.lrodata .lrodata.* .gnu.linkonce.lr.*) + } + .ldata ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) : + { + *(.ldata .ldata.* .gnu.linkonce.l.*) + . = ALIGN(. != 0 ? 64 / 8 : 1); + } + . = ALIGN(64 / 8); + _end = .; PROVIDE (end = .); + . = DATA_SEGMENT_END (.); + /* Stabs debugging sections. */ + .stab 0 : { *(.stab) } + .stabstr 0 : { *(.stabstr) } + .stab.excl 0 : { *(.stab.excl) } + .stab.exclstr 0 : { *(.stab.exclstr) } + .stab.index 0 : { *(.stab.index) } + .stab.indexstr 0 : { *(.stab.indexstr) } + .comment 0 : { *(.comment) } + /* DWARF debug sections. + Symbols in the DWARF debugging sections are relative to the beginning + of the section so we begin them at 0. */ + /* DWARF 1 */ + .debug 0 : { *(.debug) } + .line 0 : { *(.line) } + /* GNU DWARF 1 extensions */ + .debug_srcinfo 0 : { *(.debug_srcinfo) } + .debug_sfnames 0 : { *(.debug_sfnames) } + /* DWARF 1.1 and DWARF 2 */ + .debug_aranges 0 : { *(.debug_aranges) } + .debug_pubnames 0 : { *(.debug_pubnames) } + /* DWARF 2 */ + .debug_info 0 : { *(.debug_info .gnu.linkonce.wi.*) } + .debug_abbrev 0 : { *(.debug_abbrev) } + .debug_line 0 : { *(.debug_line .debug_line.* .debug_line_end ) } + .debug_frame 0 : { *(.debug_frame) } + .debug_str 0 : { *(.debug_str) } + .debug_loc 0 : { *(.debug_loc) } + .debug_macinfo 0 : { *(.debug_macinfo) } + /* SGI/MIPS DWARF 2 extensions */ + .debug_weaknames 0 : { *(.debug_weaknames) } + .debug_funcnames 0 : { *(.debug_funcnames) } + .debug_typenames 0 : { *(.debug_typenames) } + .debug_varnames 0 : { *(.debug_varnames) } + /* DWARF 3 */ + .debug_pubtypes 0 : { *(.debug_pubtypes) } + .debug_ranges 0 : { *(.debug_ranges) } + /* DWARF Extension. */ + .debug_macro 0 : { *(.debug_macro) } + .debug_addr 0 : { *(.debug_addr) } + .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) } + /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) } +} diff --git a/x86_64-linux-android/lib/ldscripts/i386linux.x b/x86_64-linux-android/lib/ldscripts/i386linux.x new file mode 100644 index 0000000..433f0d8 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/i386linux.x @@ -0,0 +1,51 @@ +/* Default linker script, for normal executables */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("a.out-i386-linux", "a.out-i386-linux", + "a.out-i386-linux") +OUTPUT_ARCH(i386) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-androidaout/lib"); +PROVIDE (__stack = 0); +SECTIONS +{ + . = 0x1020; + .text : + { + CREATE_OBJECT_SYMBOLS + *(.text) + /* The next six sections are for SunOS dynamic linking. The order + is important. */ + *(.dynrel) + *(.hash) + *(.dynsym) + *(.dynstr) + *(.rules) + *(.need) + _etext = .; + __etext = .; + } + . = ALIGN(0x1000); + .data : + { + /* The first three sections are for SunOS dynamic linking. */ + *(.dynamic) + *(.got) + *(.plt) + *(.data) + *(.linux-dynamic) /* For Linux dynamic linking. */ + CONSTRUCTORS + _edata = .; + __edata = .; + } + .bss : + { + __bss_start = .; + *(.bss) + *(COMMON) + . = ALIGN(4); + _end = . ; + __end = . ; + } +} diff --git a/x86_64-linux-android/lib/ldscripts/i386linux.xbn b/x86_64-linux-android/lib/ldscripts/i386linux.xbn new file mode 100644 index 0000000..1b1f12d --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/i386linux.xbn @@ -0,0 +1,51 @@ +/* Script for -N: mix text and data on same page; don't align data */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("a.out-i386-linux", "a.out-i386-linux", + "a.out-i386-linux") +OUTPUT_ARCH(i386) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-androidaout/lib"); +PROVIDE (__stack = 0); +SECTIONS +{ + . = 0; + .text : + { + CREATE_OBJECT_SYMBOLS + *(.text) + /* The next six sections are for SunOS dynamic linking. The order + is important. */ + *(.dynrel) + *(.hash) + *(.dynsym) + *(.dynstr) + *(.rules) + *(.need) + _etext = .; + __etext = .; + } + . = .; + .data : + { + /* The first three sections are for SunOS dynamic linking. */ + *(.dynamic) + *(.got) + *(.plt) + *(.data) + *(.linux-dynamic) /* For Linux dynamic linking. */ + CONSTRUCTORS + _edata = .; + __edata = .; + } + .bss : + { + __bss_start = .; + *(.bss) + *(COMMON) + . = ALIGN(4); + _end = . ; + __end = . ; + } +} diff --git a/x86_64-linux-android/lib/ldscripts/i386linux.xn b/x86_64-linux-android/lib/ldscripts/i386linux.xn new file mode 100644 index 0000000..03aa711 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/i386linux.xn @@ -0,0 +1,51 @@ +/* Script for -n: mix text and data on same page */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("a.out-i386-linux", "a.out-i386-linux", + "a.out-i386-linux") +OUTPUT_ARCH(i386) +SEARCH_DIR("=/tmp/b71c5d2bcd13ef366b305ad71071ded6/i386-linux-androidaout/lib"); +PROVIDE (__stack = 0); +SECTIONS +{ + . = 0; + .text : + { + CREATE_OBJECT_SYMBOLS + *(.text) + /* The next six sections are for SunOS dynamic linking. The order + is important. */ + *(.dynrel) + *(.hash) + *(.dynsym) + *(.dynstr) + *(.rules) + *(.need) + _etext = .; + __etext = .; + } + . = ALIGN(0x1000); + .data : + { + /* The first three sections are for SunOS dynamic linking. */ + *(.dynamic) + *(.got) + *(.plt) + *(.data) + *(.linux-dynamic) /* For Linux dynamic linking. */ + CONSTRUCTORS + _edata = .; + __edata = .; + } + .bss : + { + __bss_start = .; + *(.bss) + *(COMMON) + . = ALIGN(4); + _end = . ; + __end = . ; + } +} diff --git a/x86_64-linux-android/lib/ldscripts/i386linux.xr b/x86_64-linux-android/lib/ldscripts/i386linux.xr new file mode 100644 index 0000000..86b8f35 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/i386linux.xr @@ -0,0 +1,41 @@ +/* Script for ld -r: link without relocation */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("a.out-i386-linux", "a.out-i386-linux", + "a.out-i386-linux") +OUTPUT_ARCH(i386) +SECTIONS +{ + .text : + { + CREATE_OBJECT_SYMBOLS + *(.text) + /* The next six sections are for SunOS dynamic linking. The order + is important. */ + *(.dynrel) + *(.hash) + *(.dynsym) + *(.dynstr) + *(.rules) + *(.need) + } + .data : + { + /* The first three sections are for SunOS dynamic linking. */ + *(.dynamic) + *(.got) + *(.plt) + *(.data) + *(.linux-dynamic) /* For Linux dynamic linking. */ + } + .bss : + { + ; + *(.bss) + *(COMMON) + ; + ; + } +} diff --git a/x86_64-linux-android/lib/ldscripts/i386linux.xu b/x86_64-linux-android/lib/ldscripts/i386linux.xu new file mode 100644 index 0000000..4c45db7 --- /dev/null +++ b/x86_64-linux-android/lib/ldscripts/i386linux.xu @@ -0,0 +1,42 @@ +/* Script for ld -Ur: link w/out relocation, do create constructors */ +/* Copyright (C) 2014-2016 Free Software Foundation, Inc. + Copying and distribution of this script, with or without modification, + are permitted in any medium without royalty provided the copyright + notice and this notice are preserved. */ +OUTPUT_FORMAT("a.out-i386-linux", "a.out-i386-linux", + "a.out-i386-linux") +OUTPUT_ARCH(i386) +SECTIONS +{ + .text : + { + CREATE_OBJECT_SYMBOLS + *(.text) + /* The next six sections are for SunOS dynamic linking. The order + is important. */ + *(.dynrel) + *(.hash) + *(.dynsym) + *(.dynstr) + *(.rules) + *(.need) + } + .data : + { + /* The first three sections are for SunOS dynamic linking. */ + *(.dynamic) + *(.got) + *(.plt) + *(.data) + *(.linux-dynamic) /* For Linux dynamic linking. */ + CONSTRUCTORS + } + .bss : + { + ; + *(.bss) + *(COMMON) + ; + ; + } +} diff --git a/x86_64-linux-android/lib/libatomic.a b/x86_64-linux-android/lib/libatomic.a new file mode 100644 index 0000000..b566f9f Binary files /dev/null and b/x86_64-linux-android/lib/libatomic.a differ diff --git a/x86_64-linux-android/lib/libgomp.a b/x86_64-linux-android/lib/libgomp.a new file mode 100644 index 0000000..dd23ca1 Binary files /dev/null and b/x86_64-linux-android/lib/libgomp.a differ diff --git a/x86_64-linux-android/lib/libgomp.spec b/x86_64-linux-android/lib/libgomp.spec new file mode 100644 index 0000000..2fd7721 --- /dev/null +++ b/x86_64-linux-android/lib/libgomp.spec @@ -0,0 +1,3 @@ +# This spec file is read by gcc when linking. It is used to specify the +# standard libraries we need in order to link with libgomp. +*link_gomp: -lgomp diff --git a/x86_64-linux-android/lib/libquadmath.a b/x86_64-linux-android/lib/libquadmath.a new file mode 100644 index 0000000..f06b6b7 Binary files /dev/null and b/x86_64-linux-android/lib/libquadmath.a differ diff --git a/x86_64-linux-android/lib64/libatomic.a b/x86_64-linux-android/lib64/libatomic.a new file mode 100644 index 0000000..faee8f4 Binary files /dev/null and b/x86_64-linux-android/lib64/libatomic.a differ diff --git a/x86_64-linux-android/lib64/libgomp.a b/x86_64-linux-android/lib64/libgomp.a new file mode 100644 index 0000000..7917755 Binary files /dev/null and b/x86_64-linux-android/lib64/libgomp.a differ diff --git a/x86_64-linux-android/lib64/libgomp.spec b/x86_64-linux-android/lib64/libgomp.spec new file mode 100644 index 0000000..2fd7721 --- /dev/null +++ b/x86_64-linux-android/lib64/libgomp.spec @@ -0,0 +1,3 @@ +# This spec file is read by gcc when linking. It is used to specify the +# standard libraries we need in order to link with libgomp. +*link_gomp: -lgomp diff --git a/x86_64-linux-android/lib64/libquadmath.a b/x86_64-linux-android/lib64/libquadmath.a new file mode 100644 index 0000000..9859053 Binary files /dev/null and b/x86_64-linux-android/lib64/libquadmath.a differ