Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

What advantages does Agner Fog's VCL have over OpenMP?

+4
−0

Agner Fog has this C++ Vector Class Library, which is

... useful for improving code performance where speed is critical and where the compiler is unable to vectorize the code automatically in an optimal way.

However, it looks like OpenMP simd construct offers exactly this, and much more. OpenMP has it since version 4.0 (2013), i.e. for a long time.

Does the VCL have any advantages when used with a compiler which supports OpenMP 4.0?

History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

1 answer

+5
−0

The point of VCL is to allow you to work with SIMD operations explicitly. OpenMP simd is more or less just a way to provide hints to the auto-vectorization the compiler is doing, so to some degree is still subject to "the compiler is unable to vectorize the code automatically in an optimal way", except that it is worse than that since these "hints" can be wrong in subtle ways. Since the whole point of these "hints" is to provide information that the compiler was unable to figure out on its own, they are trusted and can lead to incorrect code. As an example, an incorrect or omitted safelen clause can lead to incorrect code being generated.

VCL is effectively just a library of SIMD implementations of vector operations leveraging compile-time metaprogramming to select appropriate implementations based on hardware support. It will also leverage some other compile-time constant data, e.g. if the exponent of a power operation is a compile-time constant, to choose better implementations. The key thing here is there is no uncertainty about whether vectorization will happen and the operations have well-defined behavior so there's no risk of vectorization leading to incorrect code.

The programming models are also very different. The programming model for OpenMP SIMD is that you write scalar code and then annotate it to get SIMD execution. This is great when you already have existing scalar code or want to transcribe some textbook algorithm. It's much less great when the SIMD speedup is an important goal. In that case, you have to constantly keep in mind how to write code in a way that the auto-vectorization will succeed and be correct. This is an annoying and difficult exercise requiring specialized knowledge that often produces a result where vectorization is uncertain. You'll be looking at assembly dumps to see if it vectorized the way you thought it would, and even then won't be certain that slightly different optimizer choices in a different context won't lead to much worse vectorization.

VCL's programming model is straightforward. You have a bunch of operations that have vectorized implementations. Vectorization of those operations will always happen and will always be good. There's no looking at assembly to make certain an optimization triggered as expected. The cost is that (some of) the details of the SIMD hardware are in your face and you have to deal with them. If your vectors are length 25, it's up to you to decide how you want to deal with that: do you use a Vec32 and eat the wasted space, do you unroll it into Vec16, Vec8, and scalar operation, do you unroll it into three Vec8 operations and a scalar operation, do you do something more clever to combine this with other operations, or do you change your whole algorithm to avoid this situation? This less magical programming model also means your free to just write code like normal. There are no restrictions about not breaking out of loops or restrictions to using loops.

Finally, as a practical matter, OpenMP is just a much more extensive and involved framework whose primary focus is not SIMD execution. It's a very big and complex dependency to pull in just to get access to SIMD operations.

History
Why does this post require moderator attention?
You might want to add some details to your flag.

0 comment threads

Sign up to answer this question »