Welcome to Software Development on Codidact!
Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.
Post History
The point of VCL is to allow you to work with SIMD operations explicitly. OpenMP simd is more or less just a way to provide hints to the auto-vectorization the compiler is doing, so to some degree ...
Answer
#1: Initial revision
The point of VCL is to allow you to work with SIMD operations *explicitly*. OpenMP `simd` is more or less just a way to provide hints to the auto-vectorization the compiler is doing, so to some degree is still subject to "the compiler is unable to vectorize the code automatically in an optimal way", except that it is worse than that since these "hints" can be wrong in subtle ways. Since the whole point of these "hints" is to provide information that the compiler was unable to figure out on its own, they are trusted and can lead to incorrect code. As an example, an incorrect or omitted `safelen` clause can lead to incorrect code being generated. VCL is effectively just a library of SIMD implementations of vector operations leveraging compile-time metaprogramming to select appropriate implementations based on hardware support. It will also leverage some other compile-time constant data, e.g. if the exponent of a power operation is a compile-time constant, to choose better implementations. The key thing here is there is no uncertainty about whether vectorization will happen and the operations have well-defined behavior so there's no risk of vectorization leading to incorrect code. The programming models are also very different. The programming model for OpenMP SIMD is that you write scalar code and then annotate it to get SIMD execution. This is great when you already have existing scalar code or want to transcribe some textbook algorithm. It's much less great when the SIMD speedup is an important goal. In that case, you have to constantly keep in mind how to write code in a way that the auto-vectorization will succeed and be correct. This is an annoying and difficult exercise requiring specialized knowledge that often produces a result where vectorization is uncertain. You'll be looking at assembly dumps to see if it vectorized the way you thought it would, and even then won't be certain that slightly different optimizer choices in a different context won't lead to much worse vectorization. VCL's programming model is straightforward. You have a bunch of operations that have vectorized implementations. Vectorization of those operations will always happen and will always be good. There's no looking at assembly to make certain an optimization triggered as expected. The cost is that (some of) the details of the SIMD hardware are in your face and *you* have to deal with them. If your vectors are length 25, it's up to you to decide how you want to deal with that: do you use a `Vec32` and eat the wasted space, do you unroll it into `Vec16`, `Vec8`, and scalar operation, do you unroll it into three `Vec8` operations and a scalar operation, do you do something more clever to combine this with other operations, or do you change your whole algorithm to avoid this situation? This less magical programming model also means your free to just write code like normal. There are no restrictions about not breaking out of loops or restrictions to using loops. Finally, as a practical matter, OpenMP is just a much more extensive and involved framework whose primary focus is not SIMD execution. It's a very big and complex dependency to pull in just to get access to SIMD operations.