Increasing the speed of the permutation multiplication operation due to use of simd instructions
Abstract
Algorithms for performing permutation multiplication using SIMD (Single Instruction Multiple Data) instructions of modern processors are developed and investigated in the article. The purpose of the article is to increase the performance and reduce the power consumption of the processor during the executing of typical permutation operations by creating algorithms for performing permutation operations using SIMD instructions. The scientific novelty of the article lies in the proposed approach to the hardware implementation of permutation multiplication, which through the use of SIMD instructions can reduce the execution time of this operation and, consequently, increase performance and reduce CPU power consumption. The practical value of the article is determined by the developed algorithms for permutation multiplication, the effectiveness of which is confirmed by comparison with classical multiplication procedures. An analysis of SIMD instructions that can be used to perform permutation operations is performed. The developed algorithms are based on the use of advanced processor instructions that allow you to perform operations on data presented in vector format. The advantages of using SIMD instructions to increase the speed of permutation operations are practically identified and investigated. The analysis and comparison of the speed of permutation operations with and without the use of SIMD instructions are performed. Described implementations of permutation multiplication algorithms using SIMD instructions can be used to accelerate permutation operations, in particular in the three-pass cryptographic protocol based on permutations. The use of SIMD instructions makes it possible to speed up permutation multiplication operations up to 2.6 times (depending on the hardware). The developed algorithms can be used in the implementation of methods based on a large number of permutation multiplication operations, which will significantly increase the speed of their execution
Keywords
processor; algorithm; vector; register; SSSE3; AVX2
References
[1] W. Mula, and D. Lemire, "Faster Base64 encoding and decoding using AVX2 instructions", ACM Transactions on the Web (TWEB), vol. 12, no. 3, pp. 1-26, 2018.
[2] A. Faz-Hernández, and J. López, "Fast implementation of Curve25519 using AVX2", in International Conference on Cryptology and Information Security in Latin America, 2015, pp. 329-345.
[3] A. Lemmetti, A. Koivula, M. Viitanen, J. Vanne, and T. D. Hämäläinen, "AVX2optimized Kvazaar HEVC intra encoder", in 2016 IEEE International Conference on Image Processing (ICIP), 2016, pp. 549-553.
[4] W. Mula, N. Kurz, and D. Lemire, "Faster population counts using AVX2 instructions", The Computer Journal, vol. 61, no. 1, pp. 111-120, 2018.
[5] M. J. Flynn, "Very high-speed computing systems", Proceedings of the IEEE, vol. 54, no. 12, pp. 1901-1909, 1966.
[6] M. J. Flynn, "Some computer organizations and their effectiveness", IEEE Trans. Comput., vol. 21, no. 9, pp. 948-960, 1972.
[7] "Intel® Intrinsics Guide". [Online]. Available at: https://www.intel.com/content/ www/us/en/docs/intrinsics-guide/index.html.
[8] "GeForce RTX 3090 Graphics Card | NVID-IA". [Online]. Available at: https://www.nvidia.com/en-eu/geforce/ graphics-cards/30-series/rtx-3090/.
[9] A. Shcherba, E. Faure, and O. Lavdanska, "Three-pass cryptographic protocol based on permutations", in 2020 IEEE 2nd International Conference on Advanced Trends in Information Theory (ATIT), 2020, pp. 281-284
[10] E. V. Faure, O. O. Kharin, and A. O. Lavdanskyi, "Evaluation of properties of signalcode structures synthesized on the basis of lattice theory for inseparable factorial codes", Visnyk Cherkaskogo derzhavnogo tehnologichnogo universitetu, no. 3, pp. 4047, 2020 [in Ukrainian].
[11] E. V. Faure, V. V. Shvydkyi, A. I. Shcherba, O. O. Kharin, and B. A. Stupka, "Method of cyclic synchronization based on permutations", Visnyk Cherkaskogo derzhavnogo tehnologichnogo universitetu, no. 4, pp. 6776, 2020 [in Ukrainian].
[12] E. V. Faure, V. V. Shvydkyi, A. O. Lavdanskyi, and O. O. Kharin, "Methods of factorial coding of speech signals", Radio Electronics, Computer Science, Control, no. 4, pp. 186-198, Nov. 2019.
[13] E. Faure, A. Shcherba, Y. Vasiliu, and A. Fesenko, "Cryptographic key exchange method for data factorial coding", vol. 2654, p. 643, Aug. 2020.
[14] "Programming using AVX2. Permutations". [Online]. Available at: https://software.intel.com/content/www/us/ en/develop/blogs/programming-using-avx2permutations.html.
[15] "google/benchmark: A microbenchmark support library". [Online]. Available at: https://github.com/google/benchmark