-
Notifications
You must be signed in to change notification settings - Fork 229
Use runtime feature detection for fma routines on x86 #896
New issue
Have a question about this project? No Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “No Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? No Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
3a6e471
to
7cb49a7
Compare
|
might be nitpicking, but this adds more branches levels of indirection than strictly needed. this is the implementation i had in mind pseudocode (i think data ptr -> fn ptr requires a transmute) fn fma_select() {
let fn_ptr = todo!();
// x86/x86_64 function pointers and data pointers are the same
FMA_DYNAMIC.store(fn_ptr as *mut (), Relaxed);
fn_ptr();
}
static FMA_DYNAMIC: AtomicPtr<()> = AtomicPtr::new(fma_select as fn() as *mut ());
pub fn fma() { (FMA_DYNAMIC.load(Relaxed) as fn())(); } |
@tgross35 you might want to edit the magic "Fixes" magic keyword in ur PR description since u have write access to r-l/r, it will also close the r-l/r issue if this PR merges I think |
Not at all, thanks for pointing this out. Updated, that saved another four instructions per invocation (CI runs 500 samples).
|
|
||
/// Stores a pointer that is immediately jumped to upon entering $name. By default it | ||
/// is an init function that sets FUNC to something else. | ||
static FUNC: AtomicPtr<Func> = AtomicPtr::new({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than holding a pointer to a function pointer, you should instead use an AtomicPtr<()>
and hold the function pointer directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch, I have updated this. Saves a mov
per iteration as expected.
e16aa60
to
34af030
Compare
This module is used for both i686 and x86-64.
Get performance closer to the glibc implementations by adding assembly fma routines, with runtime feature detection so they are used even if not compiled with `+fma` (as the distributed standard library is often not). Glibc uses ifuncs, this implementation stores a function pointer in an atomic. Results of CPU flags are also cached in order to avoid repeating the startup time in calls to different functions. The feature detection code is a slightly simplified version of `std-detect`. Musl sources were used as a reference [1]. Fixes: rust-lang/rust#140452 once synced [1]: https://github.com/bminor/musl/blob/c47ad25ea3b484e10326f933e927c0bc8cded3da/src/math/x32/fma.c
Get performance closer to the glibc implementations by adding assembly fma routines, with runtime feature detection so they are used even if not compiled with
+fma
(as the distributed standard library is often not). Glibc uses ifuncs, this implementation stores a function pointer in an atomic.Results of CPU flags are also cached in order to avoid repeating the startup time in calls to different functions. The feature detection code is a slightly simplified version of
std-detect
.Fixes with sync: rust-lang/rust#140452