Vector swizzling in C++
21/Oct 2009
Everyone who’s done at least some vertex/pixel shader/HLSL programming has probably encountered mechanism called “swizzling”. It’s an operation where we create new vector using arbitrarily selected components of another vector (also a little bit similiar to SSE shuffling). Code snippet is worth 100 words, so some examples:
a = b.zyzx; // a.x = b.z, a.y = b.y, a.z = b.z, a.w = b.x
a = b.wy; // a.x = b.w, a.y = b.y, a.z = b.y, a.w = b.y
a = b.z; // a.x = a.y = a.z = a.w = b.z
Vector swizzling may come handy in C++ as well. Recently, I’ve seen a discussion about it at some programming forum and thought it could be an interesting experiment to implement it. The most straightforward and brute force way would be simply to generate method for every possible component combination, but that doesn’t sound very interesting.I started with another simple approach, where you pass 4 component indices to a function and hope that compiler will be able to figure out they’re all constant and optimize it nicely. Code:
enum EVecCoord
{
X, Y, Z, W
};
struct Vec4
{
Vec4(float x, float y, float z, float w)
{
m_v[X] = x;
m_v[Y] = y;
m_v[Z] = z;
m_v[W] = w;
}
Vec4 Swizzle(EVecCoord c0, EVecCoord c1, EVecCoord c2, EVecCoord c3) const
{
return Vec4(m_v[c0], m_v[c1], m_v[c2], m_v[c3]);
}
float m_v[4];
};
// Test function
void SwizzleTest_3(const Vec4& v)
{
Vec4 v2 = v.Swizzle(X, X, W, Y);
Foo(v2);
Vec4 v3 = v.Swizzle(X, Y, Y, Y);
Foo(v3);
}
; 481 : Vec4 v2 = v.Swizzle(X, X, W, Y);
mov esi, DWORD PTR _v$[ebp]
movss xmm0, DWORD PTR [esi]
movss DWORD PTR _v2$[ebp], xmm0
movss DWORD PTR _v2$[ebp+4], xmm0
movss xmm0, DWORD PTR [esi+12]
; 482 : Foo(v2);
lea eax, DWORD PTR _v2$[ebp]
movss DWORD PTR _v2$[ebp+8], xmm0
movss xmm0, DWORD PTR [esi+4]
push eax
movss DWORD PTR _v2$[ebp+12], xmm0
call ?Foo@@YAXABUVec4@@@Z ; Foo
; 483 : Vec4 v3 = v.Swizzle(X, Y, Y, Y);
movss xmm0, DWORD PTR [esi]
; 484 : Foo(v3);
lea eax, DWORD PTR _v3$[ebp]
movss DWORD PTR _v3$[ebp], xmm0
movss xmm0, DWORD PTR [esi+4]
push eax
movss DWORD PTR _v3$[ebp+4], xmm0
movss DWORD PTR _v3$[ebp+8], xmm0
movss DWORD PTR _v3$[ebp+12], xmm0
call ?Foo@@YAXABUVec4@@@Z ; Foo
template<EVecCoord c0>
Vec4 Swizzle() const
{
return Vec4(m_v[c0], m_v[c0], m_v[c0], m_v[c0]);
}
template<EVecCoord c0, EVecCoord c1>
Vec4 Swizzle() const
{
return Vec4(m_v[c0], m_v[c1], m_v[c1], m_v[c1]);
}
template<EVecCoord c0, EVecCoord c1, EVecCoord c2>
Vec4 Swizzle() const
{
return Vec4(m_v[c0], m_v[c1], m_v[c2], m_v[c2]);
}
template<EVecCoord c0, EVecCoord c1, EVecCoord c2, EVecCoord c3>
Vec4 Swizzle() const
{
return Vec4(m_v[c0], m_v[c1], m_v[c2], m_v[c3]);
}
// Test
Vec4 v2 = v.Swizzle<X, X, W, Y>();
Foo(v2);
Vec4 v3 = v.Swizzle<X, Y>();
Foo(v3);
Are we done? Normally, we’d be, but – experiment, remember? I thought it’d be cool to have a version where you don’t have to use commas, though, so you could write a = b.Swizzle(WYZY) for example. First, I needed all the possible combinations of components. It was simple to generate them using Python script (you can get it here, it requires Python 2.6 for itertools module… I think Python may have libraries for just about everything. I fully expect version 3.0 coming with life.findMeaning). [I had a Scala version as well, I wanted to learn it, but somehow it didn’t click for me, must find another language. Scala looks like it may be really efficient, but doesn’t lend that good for home fun, IMHO] The idea was just to pass a single ‘mask’ and then somehow extract components. At first, I thought about having offset table, but it’d press compiler even harder, plus I’d have to obtain index for this table (from mask) anyway. Then again, if I was going to generate an index, I could compute offset directly as well. That’s what I did. Every component combination is 8-bit mask with 2 bits per component, masks are generated by same Python script. Here’s non-template solution:
Vec4 Swizzle(EVecCoord c) const
{
return Vec4(m_v[c], m_v[c], m_v[c], m_v[c]);
}
Vec4 Swizzle(EVecSwizzle2 swizzle) const
{
return Vec4(m_v[swizzle & 0x3], m_v[swizzle >> 2], m_v[swizzle >> 2], m_v[swizzle >> 2]);
}
Vec4 Swizzle(EVecSwizzle3 swizzle) const
{
return Vec4(m_v[swizzle & 0x3], m_v[(swizzle >> 2) & 0x3],
m_v[(swizzle >> 4) & 0x3], m_v[(swizzle >> 4) & 0x3]);
}
__forceinline Vec4 Swizzle(EVecSwizzle4 swizzle) const
{
return Vec4(m_v[swizzle & 0x3], m_v[(swizzle >> 2) & 0x3],
m_v[(swizzle >> 4) & 0x3], m_v[swizzle >> 6]);
}
// Test
Vec4 v2 = v.Swizzle(XXWY);
Foo(v2);
Vec4 v3 = v.Swizzle(XY);
Foo(v3);
Vec4 v2 = v(XYWW);
Old comments
Jonathan 2009-10-22 00:54:35
Very nice.
I’ve played with something not dissimilar for Cell BE SPU, but due to the way the shufb instruction works, I used the preprocessor - http://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/drivers/cell/spu/spu_shuffle.h
Arseny Kapoulkine 2009-10-22 05:18:53
I have a macro for that kind of stuff; on Altivec/SPU it expands to shufb (nothing non-obvious here); for PC I had to write, uhm, this:
http://www.everfall.com/paste/id.php?89p3wsk7gft0
and the test suite
http://www.everfall.com/paste/id.php?3qzz76p339ld
blackpawn 2009-10-24 00:31:40
cool! i still like the simple syntax in HLSL though. how have float4’s not made it in as native C/C++ types by now? the CPUs have had these registers for what 15 years now?? :P
C++ Swizzling | Dwight Design 2010-04-03 00:14:54
[…] libraries and forum posts where people had tried to implement it. I was able to find quite a few implementation attempts. But none of these did what I wanted. Only one had true write swizzling, and try as I […]
admin 2010-11-30 01:13:48
We do use ‘proper’ vector instructions (ie. SSE/Altivec/SPU in our case, no Wii). How you implement the shuffling itself is one thing, you still need to expose this functionality somehow and ideally - generate masks automatically.
fries 2010-11-29 13:35:28
I’m pretty sure that if you wrote your vector classes using proper vector instructions, you would get swizzles, splats, masks, permutes, etc. from almost any vector instruction set you ported to… Even the Wii’s paired singles can do some of this kind of thing in only 1 or 2 instructions.
gwiazdorrr 2013-11-12 22:58:41
Hi Maciej,
With C++ there are far more idiomatic ways of implementing swizzling. Hell, you can even replicate whole GLSL/HLSL swizzling syntax.
My take on this is CxxSwizzle (https://github.com/gwiazdorrr/CxxSwizzle). Bottom line, you can take a GLSL fragment shader and run it as C++ code, without any changes. With all the goodies (and baggage) of it. From your favourite IDE, using your favourite compiler.
It can be adopted to simulate HLSL as well.
I’d love to read your opinion on it.
admin 2013-11-18 05:59:43
Sorry, busy week. I didn’t have a chance to actually run it (no C++ 11 compliant compiler at home), but it looks impressive. I have to say, though, while it’s nice for experiments like C++ shaders (and proof of concept), it’s probably little bit of an overkill for a general purpose multi-platform vectory library.
gwiazdorrr 2013-12-04 09:16:15
Thanks for taking a look.
Regarding your concerns, given naive math implementation and poor set of support functions I have to agree with you. This was not the goal of this project. However, these problems are not rocket science, so maybe in the future…
As a side note, swizzling in D can be done in just few lines, putting C++ to shame: https://github.com/Dav1dde/gl3n/blob/master/gl3n/linalg.d#L361