Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Post History

77%
+5 −0
Q&A What advantages does Agner Fog's VCL have over OpenMP?

The point of VCL is to allow you to work with SIMD operations explicitly. OpenMP simd is more or less just a way to provide hints to the auto-vectorization the compiler is doing, so to some degree ...

posted 2y ago by Derek Elkins‭

Answer
#1: Initial revision by user avatar Derek Elkins‭ · 2022-01-31T03:25:34Z (about 2 years ago)
The point of VCL is to allow you to work with SIMD operations *explicitly*. OpenMP `simd` is more or less just a way to provide hints to the auto-vectorization the compiler is doing, so to some degree is still subject to "the compiler is unable to vectorize the code automatically in an optimal way", except that it is worse than that since these "hints" can be wrong in subtle ways. Since the whole point of these "hints" is to provide information that the compiler was unable to figure out on its own, they are trusted and can lead to incorrect code. As an example, an incorrect or omitted `safelen` clause can lead to incorrect code being generated.

VCL is effectively just a library of SIMD implementations of vector operations leveraging compile-time metaprogramming to select appropriate implementations based on hardware support. It will also leverage some other compile-time constant data, e.g. if the exponent of a power operation is a compile-time constant, to choose better implementations. The key thing here is there is no uncertainty about whether vectorization will happen and the operations have well-defined behavior so there's no risk of vectorization leading to incorrect code.

The programming models are also very different. The programming model for OpenMP SIMD is that you write scalar code and then annotate it to get SIMD execution. This is great when you already have existing scalar code or want to transcribe some textbook algorithm. It's much less great when the SIMD speedup is an important goal. In that case, you have to constantly keep in mind how to write code in a way that the auto-vectorization will succeed and be correct. This is an annoying and difficult exercise requiring specialized knowledge that often produces a result where vectorization is uncertain. You'll be looking at assembly dumps to see if it vectorized the way you thought it would, and even then won't be certain that slightly different optimizer choices in a different context won't lead to much worse vectorization.

VCL's programming model is straightforward. You have a bunch of operations that have vectorized implementations. Vectorization of those operations will always happen and will always be good. There's no looking at assembly to make certain an optimization triggered as expected. The cost is that (some of) the details of the SIMD hardware are in your face and *you* have to deal with them. If your vectors are length 25, it's up to you to decide how you want to deal with that: do you use a `Vec32` and eat the wasted space, do you unroll it into `Vec16`, `Vec8`, and scalar operation, do you unroll it into three `Vec8` operations and a scalar operation, do you do something more clever to combine this with other operations, or do you change your whole algorithm to avoid this situation? This less magical programming model also means your free to just write code like normal. There are no restrictions about not breaking out of loops or restrictions to using loops.

Finally, as a practical matter, OpenMP is just a much more extensive and involved framework whose primary focus is not SIMD execution. It's a very big and complex dependency to pull in just to get access to SIMD operations.