Rust should pass vectors by vector register #93490

Miksel12 · 2022-01-30T21:18:37Z

Rust currently doesn't pass vectors of floats by vector register.

This should be able to be passed by vector registers:

pub struct Stats { x: f32, y: f32, z: f32, q: f32  }


pub fn sum_rust(a: &Stats, b: &Stats) -> Stats {
    return Stats {x: a.x + b.x, y: a.y + b.y, z: a.z + b.z, q: a.q + b.q };
}

But in Rust 1.47 it uses the stack:

example::sum_rust:
        mov     rax, rdi
        movups  xmm0, xmmword ptr [rsi]
        movups  xmm1, xmmword ptr [rdx]
        addps   xmm1, xmm0
        movups  xmmword ptr [rdi], xmm1
        ret

Post 1.47, it is packed into integer registers (see this issue: #85265):

example::sum_rust:
        movss   xmm0, dword ptr [rdi]
        movss   xmm1, dword ptr [rdi + 4]
        addss   xmm0, dword ptr [rsi]
        addss   xmm1, dword ptr [rsi + 4]
        movsd   xmm2, qword ptr [rdi + 8]
        movsd   xmm3, qword ptr [rsi + 8]
        addps   xmm3, xmm2
        movd    eax, xmm0
        movd    ecx, xmm1
        movd    esi, xmm3
        shufps  xmm3, xmm3, 229
        movd    edx, xmm3
        shl     rdx, 32
        or      rdx, rsi
        shl     rcx, 32
        or      rax, rcx
        ret

This issue should be fixed by #93405 and should bring it back to pre 1.48.
But ideally it should be optimized to:

example::sum_rust:
        addps   xmm1, xmm0
        ret

@dotdash mentions in #85265 that this is due to Rust not using the proper types on the LLVM IR level: #85265 (comment)

EDIT:
Clang is able to use this optimization in a similar case:

struct Foo
{
    float bar1;
    float bar2;
    float bar3;
    float bar4;
};

Foo sum_cpp(Foo foo1, Foo foo2)
{
    Foo foo3;
    foo3.bar1 = foo1.bar1 + foo2.bar1;
    foo3.bar2 = foo1.bar2 + foo2.bar2;
    foo3.bar3 = foo1.bar3 + foo2.bar3;
    foo3.bar4 = foo1.bar4 + foo2.bar4;
    return foo3;
}

Gets turned into:

sum_cpp(Foo, Foo):                       # @sum_cpp(Foo, Foo)
        addps   xmm0, xmm2
        addps   xmm1, xmm3
        ret

nikic · 2022-01-30T21:32:35Z

This doesn't make sense for integers. Yes, you've picked out a very lucky example where this allows vectorization, but generally this just means that data will have to be moved back and forth between vector and GPR registers, for the majority case where no vectorization is possible. Passing these by pointer (as in 1.47) is the right thing to do.

Urgau · 2022-01-30T21:45:16Z

I actually tried this but this is complicated because this register are influenced by #[target_feature], which could lead to unsound code when called from a code that doesn't use this register.
https://github.com/rust-lang/rust/blob/803e19c6ebc647d4e600967c255fccea838bce9f/compiler/rustc_middle/src/ty/layout.rs#L3197-L3221

Miksel12 · 2022-01-30T22:04:30Z

This doesn't make sense for integers. Yes, you've picked out a very lucky example where this allows vectorization, but generally this just means that data will have to be moved back and forth between vector and GPR registers, for the majority case where no vectorization is possible. Passing these by pointer (as in 1.47) is the right thing to do.

That makes sense, but the exact same happens with floats. It is beneficial for floats to use vector registers, isn't it?
In that case I should probably use floats for my example.
I checked Clang to see how C++ is optimized and integers are indeed not passed in vector registers. Floats are.

struct Foo
{
    float bar1;
    float bar2;
    float bar3;
    float bar4;
};

Foo sum_cpp(Foo foo1, Foo foo2)
{
    Foo foo3;
    foo3.bar1 = foo1.bar1 + foo2.bar1;
    foo3.bar2 = foo1.bar2 + foo2.bar2;
    foo3.bar3 = foo1.bar3 + foo2.bar3;
    foo3.bar4 = foo1.bar4 + foo2.bar4;
    return foo3;
}

Gets turned into:

sum_cpp(Foo, Foo):                       # @sum_cpp(Foo, Foo)
        addps   xmm0, xmm2
        addps   xmm1, xmm3
        ret

Miksel12 · 2022-01-30T22:50:26Z

I actually tried this but this is complicated because this register are influenced by #[target_feature], which could lead to unsound code when called from a code that doesn't use this register.

https://github.com/rust-lang/rust/blob/803e19c6ebc647d4e600967c255fccea838bce9f/compiler/rustc_middle/src/ty/layout.rs#L3197-L3221

It seems like that issue is discussed in: #79865. It also seems like that issue will be fixed by upgrading to LLVM 14: #79865 (comment)
Though I'm not entirely sure it is the same issue.

Urgau · 2022-01-30T22:55:35Z

I checked Clang to see how C++ is optimized and integers are indeed not passed in vector registers. Floats are.

This is because clang generate a more optimized layout dso_local { <2 x float>, <2 x float> } instead of %Foo = type { float, float, float, float }. This is something I also tried to do is my PR but this way more complicated (it also required the abi compatibility to be fixed) so I decided to not include it.

Urgau · 2022-02-01T23:24:36Z

@Miksel12 I've open #93564 to fix the general issue related to the aggregation of types and I manage to also fix this issue.

With my PR your example code would now be compiled to:

sum_rust:
        addps	xmm0, xmm1
	ret

Even better than clang !

EDIT: It's no longer the case, due to the abi + target_features unsoundness.

scottmcm · 2022-02-06T01:28:05Z

Rust should pass vectors by vector register

Note that if you add repr(simd) then it does pass deal in vector types: https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=1645fca36d467d49bc9bd4f299f8243f

So arguably this more about "repr(rust) should use vector representations sometimes". (And of course when we eventually get stdsimd then people could do this as a repr(transparent) wrapper around Simd<f32, 4>.)

bjorn3 · 2022-02-06T11:10:58Z

Note that if you add repr(simd) then it does pass deal in vector types: https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=1645fca36d467d49bc9bd4f299f8243f

It doesn't. It passes them by reference both when using &Stats as arguments and when using Stats as arguments (llvm ir is identical). There would be no load or store if it was passed in vector registers.

define void @_ZN10playground8sum_rust17hf4e374897bed05a9E(<4 x float>* noalias nocapture sret(<4 x float>) dereferenceable(16) %0, <4 x float>* noalias nocapture readonly align 16 dereferenceable(16) %a, <4 x float>* noalias nocapture readonly align 16 dereferenceable(16) %b) unnamed_addr #0 {
start:
  %1 = load <4 x float>, <4 x float>* %a, align 16
  %2 = load <4 x float>, <4 x float>* %b, align 16
  %3 = fadd <4 x float> %1, %2
  store <4 x float> %3, <4 x float>* %0, align 16
  ret void
}

workingjubilee · 2022-02-13T18:36:50Z

Having read this issue, I believe this is functionally a duplicate of #64609, #85265, and #91447, so I am closing this.

Urgau mentioned this issue Feb 1, 2022

Don't aggregate homogeneous floats in the Rust ABI #93564

Closed

workingjubilee closed this as completed Feb 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rust should pass vectors by vector register #93490

Rust should pass vectors by vector register #93490

Miksel12 commented Jan 30, 2022 •

edited

Loading

nikic commented Jan 30, 2022

Urgau commented Jan 30, 2022

Miksel12 commented Jan 30, 2022 •

edited

Loading

Miksel12 commented Jan 30, 2022

Urgau commented Jan 30, 2022

Urgau commented Feb 1, 2022 •

edited

Loading

scottmcm commented Feb 6, 2022

bjorn3 commented Feb 6, 2022

workingjubilee commented Feb 13, 2022 •

edited

Loading

Rust should pass vectors by vector register #93490

Rust should pass vectors by vector register #93490

Comments

Miksel12 commented Jan 30, 2022 • edited Loading

nikic commented Jan 30, 2022

Urgau commented Jan 30, 2022

Miksel12 commented Jan 30, 2022 • edited Loading

Miksel12 commented Jan 30, 2022

Urgau commented Jan 30, 2022

Urgau commented Feb 1, 2022 • edited Loading

scottmcm commented Feb 6, 2022

bjorn3 commented Feb 6, 2022

workingjubilee commented Feb 13, 2022 • edited Loading

Miksel12 commented Jan 30, 2022 •

edited

Loading

Miksel12 commented Jan 30, 2022 •

edited

Loading

Urgau commented Feb 1, 2022 •

edited

Loading

workingjubilee commented Feb 13, 2022 •

edited

Loading