Skip to content

[benchmark] Add ReplaceSubrange benchmark #25310

New issue

Have a question about this project? No Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “No Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? No Sign in to your account

Merged
merged 10 commits into from
Aug 2, 2019

Conversation

keitaito
Copy link
Contributor

@keitaito keitaito commented Jun 7, 2019

🔨 Changes

Add ReplaceSubrange benchmark. It is still work in progress, but it would be great if I could get feedback to make sure I'm on the right direction 🙇‍♂️

Resolves a part of SR-8905 Gaps in String benchmarking.

Shout-out to Rob and Michael for mentoring at try! Swift San Jose!

To-do list

  • Arguments of types String
  • Substring
  • Array
  • Repeated

@beccadax
Copy link
Contributor

beccadax commented Jun 7, 2019

@milseman I'm thinking this is in your wheelhouse.

@beccadax
Copy link
Contributor

beccadax commented Jun 8, 2019

@swift-ci please smoke test

@beccadax
Copy link
Contributor

beccadax commented Jun 8, 2019

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Jun 8, 2019

Performance: -O

Regression OLD NEW DELTA RATIO
CharacterLiteralsLarge 97 108 +11.3% 0.90x
 
Improvement OLD NEW DELTA RATIO
ArrayAppendGenericStructs 2290 1340 -41.5% 1.71x (?)
ObjectiveCBridgeStubFromNSStringRef 175 158 -9.7% 1.11x (?)
 
Added MIN MAX MEAN MAX_RSS
ReplaceSubrangeWithLargeLiteralString 1899 1941 1921
ReplaceSubrangeWithLargeManagedString 1820 1865 1835
ReplaceSubrangeWithSmallLiteralString 1994 2070 2024

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
CharacterLiteralsLarge 100 111 +11.0% 0.90x (?)
Set.subtracting.Seq.Empty.Box 211 227 +7.6% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
FlattenListLoop 5276 4673 -11.4% 1.13x (?)
 
Added MIN MAX MEAN MAX_RSS
ReplaceSubrangeWithLargeLiteralString 1839 1936 1880
ReplaceSubrangeWithLargeManagedString 1809 1868 1830
ReplaceSubrangeWithSmallLiteralString 1988 2102 2026

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
ArrayAppendGenericStructs 1380 2220 +60.9% 0.62x (?)
 
Added MIN MAX MEAN MAX_RSS
ReplaceSubrangeWithLargeLiteralString 4321 4387 4343
ReplaceSubrangeWithLargeManagedString 3748 3805 3769
ReplaceSubrangeWithSmallLiteralString 3655 3720 3690

Code size: -swiftlibs

Benchmark Check Report
⚠️🔤 ReplaceSubrangeWithLargeLiteralString name is composed of 6 words.
Split ReplaceSubrangeWithLargeLiteralString name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⚠️ ReplaceSubrangeWithLargeLiteralString execution took at least 1811 μs.
Decrease the workload of ReplaceSubrangeWithLargeLiteralString by a factor of 2 (10), to be less than 1000 μs.
⚠️🔤 ReplaceSubrangeWithLargeManagedString name is composed of 6 words.
Split ReplaceSubrangeWithLargeManagedString name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⚠️ ReplaceSubrangeWithLargeManagedString execution took at least 1798 μs.
Decrease the workload of ReplaceSubrangeWithLargeManagedString by a factor of 2 (10), to be less than 1000 μs.
⚠️🔤 ReplaceSubrangeWithSmallLiteralString name is composed of 6 words.
Split ReplaceSubrangeWithSmallLiteralString name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming
⚠️ ReplaceSubrangeWithSmallLiteralString execution took at least 1994 μs.
Decrease the workload of ReplaceSubrangeWithSmallLiteralString by a factor of 2 (10), to be less than 1000 μs.
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@beccadax
Copy link
Contributor

beccadax commented Jun 8, 2019

@keitaito This is a great start, but can you address the issues listed at the bottom of the benchmark comment, please?

@palimondo
Copy link
Contributor

@keitaito I suggest naming these benchmarks per naming convention as

  • String.replaceSubrange.SmallLiteral
  • String.replaceSubrange.LargeLiteral
  • String.replaceSubrange.LargeManaged

and the file as StringReplaceSubrange.swift.

@palimondo
Copy link
Contributor

palimondo commented Jun 11, 2019

You need to halve the loop multiplier. Mind the 80 character line limit. I would also just create a single testing function and pass the string as parameter. Since it'll be marked as @inline(never), you don't need to use getString to prevent the optimizations. Then you'd use inline closure to define runFunctions. You can initialize the large managed string in setup also with inline closure and blackHole function:

let t: [BenchmarkCategory] = [.validation, .api, .String]

public let StringReplaceSubrange = [
  BenchmarkInfo(name: "String.replaceSubrange.SmallLiteral",
    runFunction: { replaceSubrange($0, "coffee" }, tags: t),
  BenchmarkInfo(name: "String.replaceSubrange.LargeLiteral",
    runFunction: { replaceSubrange($0, "coffee\u{301}coffeecoffeecoffee" },
    tags: t),
  BenchmarkInfo(name: "ReplaceSubrangeWithLargeManagedString",
    runFunction: { replaceSubrange($0, largeManaged) }, tags: t,
    setUpFunction: { blackHole(largeManaged) }),
]

@inline(never)
public func replaceSubrange(_ N: Int, _ string: String) {
  var copy = string
  let range = string.startIndex..<string.index(after: string.startIndex)
  for _ in 0 ..< 5_000*N {
    copy.replaceSubrange(range, with: "t")
  }
}

@milseman Does it make sense to test range replacement with a single character that doesn't change the size? How about something like this?

@inline(never)
public func replaceSubrange(_ N: Int, _ string: String) {
  var s = string
  for _ in 0 ..< 2_500*N {
    let ff =
      s.index(s.startIndex, offsetBy:2)..<s.index(s.startIndex, offsetBy:4)
    s.replaceSubrange(ff, with: "☕️")
    let cup =
      s.index(s.startIndex, offsetBy:2)..<s.index(s.startIndex, offsetBy:3)
    copy.replaceSubrange(cup, with: "ff")
  }
  CheckResults(s == string)
}

Also, since replaceSubrange is mutable operation, how much sense does it make to vary these across literals and managed strings? After the initial modification, wouldn't this differ only in StringGuts(_SmallString or not)? I mean, isn't the LargeLiteral and LargeManaged effectively the same minus the one char difference in length?

@palimondo
Copy link
Contributor

palimondo commented Jun 11, 2019

@milseman Could you also please verify my reasoning above regarding the getString and shared run function? I'm starting to doubt myself looking at the reported runtimes…

Also, would it make sense to do a variant that would cross the boundary between the small and large representations? Or vary the benchmarks based on the position of the range, i.e. replacing from start, in the middle or just at the end, to stress the reallocations?

@keitaito
Copy link
Contributor Author

@brentdax Thanks Brent for running tests 😄

@palimondo Hi Pavol, thanks for the great feedback! I will start working on easy fixes like re-naming the file and benchmarks, and fixing line length 🔨

@milseman
Copy link
Member

milseman commented Jun 11, 2019

@palimondo

I would also just create a single testing function and pass the string as parameter. Since it'll be marked as @inline(never), you don't need to use getString to prevent the optimizations.

I don't think @inline(never) is sufficient here, but I agree with having a single function that runs over different workloads. You could say:

@inline(never)
public func replaceSubrange(_ N: Int, _ string: String) {
  var copy = getString(string)
  let range = string.startIndex..<string.index(after: string.startIndex)
  for _ in 0 ..< 5_000*N {
    copy.replaceSubrange(range, with: "t")
  }
}

So there's no measurable overhead in the getString call, but it serves as an optimization barrier, should the optimizer ever do more inter-procedural optimizations that are not contingent on prior inlining.

Also, would it make sense to do a variant that would cross the boundary between the small and large representations? Or vary the benchmarks based on the position of the range, i.e. replacing from start, in the middle or just at the end, to stress the reallocations?

I think a benchmark that grows a small string to a large one could be interesting. We don't often go the other way, and we don't shrink the string under normal operations (but keepingCapacity: false, making a brand new one, etc., will form the small string). If a string is already allocated with excess capacity, it's likely the user requested that capacity or are otherwise going to be filling it soon, so there's more harm in deallocating it right away only to re-allocate.

I don't think it matters what that operation is, and append is more straightforward for a micro-benchmark than replaceSubrange.

edit: As for varying the location and triggering reallocations, I think that also would be better done as a separate dedicated benchmark. This benchmark stresses the overhead of the 2-way introspection and dispatch pattern that eventually results in splicing in the underlying bytes.

Copy link
Member

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good if you can incorporate @palimondo 's feedback.

After thinking about this more, the largeLiteral is likely redundant with largeManaged, because after the very first replaceSubrange() call, it will be managed (due to mutation) for the subsequent 10_000 operations. The large vs small split is nice, though.

Copy link
Contributor Author

@keitaito keitaito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking a while to update this PR, but now this PR adds String.replaceSubrange(_:with:) benchmarks with arguments of types String, Substring, Array, and Repeated. Would you mind taking a look at this PR again when you have time?

Also, while I was working on this PR, I've got a few questions related to Swift.String. Would you mind giving me some explanations on them if you have time? 🙇‍♂️


private var largeManagedString: String = {
return getString("coffee\u{301}coffeecoffeecoffeecoffee")
}()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\u{301} was suggested to be added to the test string when I paired with Michael. @milseman would you mind reminding me the reason for this? I don't remember it now 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll guess it was in order for the string to be in particular normalization form. @milseman Do you want to vary the benchmarks also for different normalization forms? SR-8905 doesn't mention that…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if the result of testing with the string "coffeécoffeecoffeecoffeecoffee" would be different from the one with "coffeecoffeecoffeecoffeecoffee" 🤔 If there is distinct difference, maybe we could add two benchmarks for the one with the acute accent character and the other one without it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grapheme segmentation is relevant to this benchmark, but not normalization (since there's no comparison). The difference is that the precomposed representation is a single scalar per grapheme cluster, while the decomposed (multi-scalar) form is not. The single-scalar one will hit our grapheme breaking fast-paths while the multi-scalar one will call out to ICU. Alternatively, you could use other kinds of multi-scalar graphemes clusters, such as complex emoji. I just mentioned "\u{301}" because you can just stick it after an "e" to get the same effect.

BenchmarkInfo(
name: "Str.replaceSubrange.SmallLiteral.RepeatedChar",
runFunction: { replaceSubrange($0, "coffee", with: getRepeatedCharacter(repeatedCharacter)) },
tags: tags
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmark name "Str.replaceSubrange.SmallLiteral.RepeatedChar" is longer than 40 characters, but I couldn't think a better name fitting 40. Maybe it can be like "Str.replaceSubrange.LargeManagedRepChar", but I was concerned "RepChar" is a little bit hard to understand that it means Repeated<Character>. @palimondo What do you think about this naming? Please let me know if you have any suggestions on it 🙂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at what @milseman writes in SR-8905:

replaceSubrange<C: Collection>(_:C)

  • Arguments of types String, Substring, Array<Character>, Repeated<Character>, etc

I'd say the naming convention calls for base name of String.replaceSubrange which varies across the argument type (String, Substring, ArrChar, RepChar) for the general case of large strings. Then we'll denote the special optimization for small strings with a simple .Small suffix and we'll get these benchmarks:

  • String.replaceSubrange.String
  • String.replaceSubrange.Substring
  • String.replaceSubrange.ArrChar
  • String.replaceSubrange.RepChar
  • String.replaceSubrange.String.Small
  • String.replaceSubrange.Substring.Small
  • String.replaceSubrange.ArrChar.Small
  • String.replaceSubrange.RepChar.Small

The longest one is String.replaceSubrange.Substring.Small at 39 characters, just under the 40 chars limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an awesome naming idea. I will use them. Thanks for your suggestion!


private func setupLargeManagedString() {
_ = largeManagedString
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should largeManagedString be called from setupFunction closure before it is used for benchmarks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that in the replaceSubrange(::with:) you already do the var copy = getString(string), this whole dance here is unnecessary. You should declare this as simple let largeString = "coffee\u{301}coffeecoffeecoffeecoffee" and drop all the setUpFunctions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Thanks for explaining!

let range = string.startIndex..<string.index(after: string.startIndex)
for _ in 0 ..< 5_000 * N {
copy.replaceSubrange(range, with: replacingString)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the criteria for choosing this multiplying number like 5000? Does this depend on the benchmark time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. We are just trying to size the workload to run in 20–1000 μs, so that it is in a measurement sweet spot for our system.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining!

BenchmarkInfo(
name: "Str.replaceSubrange.SmallLiteral.String",
runFunction: { replaceSubrange($0, "coffee", with: "t") },
tags: tags
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind explaining what is the difference between the small literal string vs the large managed string? Is this something related to this small string optimization? If the string fits 15 ASCII characters length, it won't be allocated in the heap memory?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. See: _SmallString and _StringGuts for implementation details if you're interested.

Copy link
Contributor Author

@keitaito keitaito Jul 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the links! I will take a look at them 🔍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small form accommodates 15 UTF-8 code units in length (not just ASCII)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for clarifying the length of the small string, Michael 😄

@keitaito keitaito marked this pull request as ready for review July 2, 2019 06:43
@palimondo
Copy link
Contributor

@swift-ci please benchmark

@swift-ci

This comment has been minimized.

Copy link
Contributor

@palimondo palimondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one generic test method should be able to over all of our test cases. See inline comments.


private func setupLargeManagedString() {
_ = largeManagedString
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that in the replaceSubrange(::with:) you already do the var copy = getString(string), this whole dance here is unnecessary. You should declare this as simple let largeString = "coffee\u{301}coffeecoffeecoffeecoffee" and drop all the setUpFunctions.

),
BenchmarkInfo(
name: "Str.replaceSubrange.SmallLiteral.Substr",
runFunction: { replaceSubrange($0, "coffee", with: getSubstring("t")) },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to put an optimization barrier (by calling the getSubstring from TestUtils) here.
Shorter way to get a Substring is to get a full subrange like this: "t"[...].

For an implementation symmetry, I'd also extract the "coffee" into smallString constant, so that our benchmark definitions would vary only like this:

  • runFunction: { replaceSubrange($0, largeString, with: "t") }
  • runFunction: { replaceSubrange($0, smallString, with: "t") }
  • runFunction: { replaceSubrange($0, largeString, with: "t"[...]) }
  • runFunction: { replaceSubrange($0, smallString, with: "t"[...]) }

@@ -322,3 +322,11 @@ public func getString(_ s: String) -> String { return s }
// The same for Substring.
@inline(never)
public func getSubstring(_ s: Substring) -> Substring { return s }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to define any new optimization barrier functions...

Copy link
Contributor

@palimondo palimondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review comment got somehow lost… Let me try again.

}

@inline(never)
private func replaceSubrange(_ N: Int, _ string: String, with replacingSubstring: Substring) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand @milseman's intention from SR-8905 correctly, we are designing benchmark for the generic replaceSubstring method that varies across the String, Substring, Array<Character> and Repeat<Character> specializations.

Therefore we should be able to define single shared generic test function and vary the parameter in runFunction closure in BenchmarkInfo declarations.

For an example of such benchmarks, see append variants in DataBenchmarks

@milseman Any thoughts on keeping or dropping the @inline(never) annotation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could refactor this to a generic @inline(__always) implementation function that takes a scale factor (because the String version should be much faster than the Repeat version) and does the loop and replaceSubrange calls. You'd likely want to make a @inline(never) top-level function for each individual benchmark. It doesn't save you a whole lot.

@keitaito
Copy link
Contributor Author

keitaito commented Jul 4, 2019

@palimondo Thank you so much for another review, Pavol! I will address your comments and fix warnings and errors from the swift-ci benchmark check report 🔨

@keitaito
Copy link
Contributor Author

keitaito commented Jul 7, 2019

Several build failure errors are suddenly thrown when running ninja swift-benchmark-macosx-x86_64 😢 Have you seen these errors? Do you have any ideas how to fix them and re-build swift-benchmark?

/[my-own-path]/swift-source/build/Ninja-RelWithDebInfoAssert/swift-macosx-x86_64 $ ninja swift-benchmark-macosx-x86_64

swift-source/swift/stdlib/public/stubs/MathStubs.cpp:21:10: fatal error: 'climits' file not found
swift-source/swift/include/swift/Runtime/Debug.h:20:10: fatal error: 'cstdarg' file not found
swift-source/swift/stdlib/public/stubs/CommandLine.cpp:17:10: fatal error: 'vector' file not found
swift-source/swift/stdlib/public/stubs/LibcShims.cpp:30:10: fatal error: 'type_traits' file not found
swift-source/swift/include/swift/Runtime/Debug.h:20:10: fatal error: 'cstdarg' file not found
swift-source/swift/stdlib/public/stubs/ThreadLocalStorage.cpp:13:10: fatal error: 'cstring' file not found
swift-source/swift/stdlib/public/stubs/Stubs.cpp:37:10: fatal error: 'climits' file not found
swift-source/swift/stdlib/public/stubs/../SwiftShims/RefCount.h:30:10: fatal error: 'type_traits' file not found
swift-source/swift/include/swift/Basic/Lazy.h:16:10: fatal error: 'memory' file not found

swift-source/swift/stdlib/public/runtime/ImageInspection.h:25:10: fatal error: 'cstdint' file not found
swift-source/swift/include/swift/ABI/Metadata.h:20:10: fatal error: 'atomic' file not found
swift-source/llvm/include/llvm/Support/Compiler.h:20:10: fatal error: 'new' file not found
swift-source/swift/include/swift/Runtime/HeapObject.h:20:10: fatal error: 'cstddef' file not found
swift-source/swift/include/swift/Basic/Range.h:38:10: fatal error: 'algorithm' file not found
swift-source/swift/include/swift/Demangling/Demangle.h:22:10: fatal error: 'memory' file not found
swift-source/swift/include/swift/Basic/Lazy.h:16:10: fatal error: 'memory' file not found
swift-source/swift/include/swift/Basic/Lazy.h:16:10: fatal error: 'memory' file not found
swift-source/swift/include/swift/ABI/Metadata.h:20:10: fatal error: 'atomic' file not found
swift-source/swift/include/swift/ABI/Metadata.h:20:10: fatal error: 'atomic' file not found

A lot of error messages are displayed on Terminal. Here is an example of a build failure error.

[1/743] Building CXX object stdlib/public/runtime/CMakeFiles/swiftRuntime-macosx-x86_64.dir/Enum.cpp.o
FAILED: stdlib/public/runtime/CMakeFiles/swiftRuntime-macosx-x86_64.dir/Enum.cpp.o 
/[my-own-path]/swift-source/build/Ninja-RelWithDebInfoAssert/llvm-macosx-x86_64/./bin/clang++  -DCMARK_STATIC_DEFINE -DGTEST_HAS_RTTI=0 -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Istdlib/public/runtime -I/[my-own-path]/swift-source/swift/stdlib/public/runtime -Iinclude -I/[my-own-path]/swift-source/swift/include -I/[my-own-path]/swift-source/llvm/include -I/[my-own-path]/swift-source/build/Ninja-RelWithDebInfoAssert/llvm-macosx-x86_64/include -I/[my-own-path]/swift-source/llvm/tools/clang/include -I/[my-own-path]/swift-source/build/Ninja-RelWithDebInfoAssert/llvm-macosx-x86_64/tools/clang/include -I/[my-own-path]/swift-source/cmark/src -I/[my-own-path]/swift-source/build/Ninja-RelWithDebInfoAssert/cmark-macosx-x86_64/src -Wno-unknown-warning-option -Werror=unguarded-availability-new -fno-stack-protector -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-class-memaccess -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -Werror=switch -Wdocumentation -Wimplicit-fallthrough -Wunreachable-code -Woverloaded-virtual -DOBJC_OLD_DISPATCH_PROTOTYPES=0 -fno-sanitize=all -DLLVM_DISABLE_ABI_BREAKING_CHECKS_ENFORCING=1 -O2  -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk   -UNDEBUG  -fno-exceptions -fno-rtti -Wall -Wglobal-constructors -Wexit-time-destructors -fvisibility=hidden -DSWIFT_RUNTIME_CLOBBER_FREED_OBJECTS=1 -DswiftCore_EXPORTS -I/[my-own-path]/swift-source/swift/include -DSWIFT_TARGET_LIBRARY_NAME=swiftRuntime -target x86_64-apple-macosx10.9 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk -arch x86_64 -F /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/../../../Developer/Library/Frameworks -mmacosx-version-min=10.9 -O2 -g -UNDEBUG -DSWIFT_ENABLE_RUNTIME_FUNCTION_COUNTERS -MD -MT stdlib/public/runtime/CMakeFiles/swiftRuntime-macosx-x86_64.dir/Enum.cpp.o -MF stdlib/public/runtime/CMakeFiles/swiftRuntime-macosx-x86_64.dir/Enum.cpp.o.d -o stdlib/public/runtime/CMakeFiles/swiftRuntime-macosx-x86_64.dir/Enum.cpp.o -c /[my-own-path]/swift-source/swift/stdlib/public/runtime/Enum.cpp
In file included from /[my-own-path]/swift-source/swift/stdlib/public/runtime/Enum.cpp:17:
In file included from /[my-own-path]/swift-source/llvm/include/llvm/Support/ErrorHandling.h:18:
/[my_path]swift-source/llvm/include/llvm/Support/Compiler.h:20:10: fatal error: 'new' file not found
#include <new>
         ^~~~~
1 error generated.

I'm not sure if it's related, but these errors started being thrown after Xcode 11 beta 3 was installed on my machine, FWIW.

@palimondo
Copy link
Contributor

I'd guess you need to do a clean build after Xcode update.

- Re-name benchmark names
- Remove setupLargeManagedString()
- Remove unnecessary optimization barrier functions
- Update benchmark argument string by using smallString and largeString
- Remove unnecessary optimization barrier function calls
- Update replaceSubrange(_:_:with) to be generic function
@keitaito
Copy link
Contributor Author

@palimondo Thanks for the reply on the build issue, Pavol! I ended up re-building everything from scratch 😅

@keitaito
Copy link
Contributor Author

The benchmark with Repeated<Character> takes much longer time than others. Do you have any idea why this type is slower than others? Is there any way to improve this benchmark? The benchmark result on my machine is the following:

// When the for-in loop runs with `0 ..< 5_000 * N`
$ ./Benchmark_O 772 773 774 775 776 777 778 779
#,TEST,SAMPLES,MIN(μs),MAX(μs),MEAN(μs),SD(μs),MEDIAN(μs)
772,String.replaceSubrange.ArrChar,1,2049,2049,2049,0,2049
773,String.replaceSubrange.ArrChar.Small,1,1240,1240,1240,0,1240
774,String.replaceSubrange.RepChar,1,31690,31690,31690,0,31690
775,String.replaceSubrange.RepChar.Small,1,4670,4670,4670,0,4670
776,String.replaceSubrange.String,1,1105,1105,1105,0,1105
777,String.replaceSubrange.String.Small,1,1233,1233,1233,0,1233
778,String.replaceSubrange.Substring,1,2771,2771,2771,0,2771
779,String.replaceSubrange.Substring.Small,1,1477,1477,1477,0,1477

Total performance tests executed: 8
// When the for-in loop runs with `0 ..< 250 * N`
$ ./Benchmark_O 772 773 774 775 776 777 778 779
#,TEST,SAMPLES,MIN(μs),MAX(μs),MEAN(μs),SD(μs),MEDIAN(μs)
772,String.replaceSubrange.ArrChar,1,92,92,92,0,92
773,String.replaceSubrange.ArrChar.Small,1,54,54,54,0,54
774,String.replaceSubrange.RepChar,1,1209,1209,1209,0,1209
775,String.replaceSubrange.RepChar.Small,1,186,186,186,0,186
776,String.replaceSubrange.String,1,54,54,54,0,54
777,String.replaceSubrange.String.Small,1,52,52,52,0,52
778,String.replaceSubrange.Substring,1,122,122,122,0,122
779,String.replaceSubrange.Substring.Small,1,63,63,63,0,63

Total performance tests executed: 8

@milseman
Copy link
Member

@keitaito there are explicit pre-specializations for the others: https://github.com/apple/swift/blob/master/stdlib/public/core/StringRangeReplaceableCollection.swift#L197

Benchmarking a non-pre-specialized argument type helps identify and track changes in the fully-generic implementation.

@palimondo
Copy link
Contributor

@swift-ci please benchmark

@palimondo
Copy link
Contributor

I’m running the benchmarks on CI with your current multiplier (5000), to get a relation to your local results.

@swift-ci
Copy link
Contributor

Performance: -O

Improvement OLD NEW DELTA RATIO
FlattenListFlatMap 5939 4087 -31.2% 1.45x (?)
SuffixSequenceLazy 319 289 -9.4% 1.10x (?)
SuffixSequence 317 291 -8.2% 1.09x (?)
 
Added MIN MAX MEAN MAX_RSS
String.replaceSubrange.ArrChar 714 740 729
String.replaceSubrange.ArrChar.Small 568 580 574
String.replaceSubrange.RepChar 18516 18770 18664
String.replaceSubrange.RepChar.Small 2657 2779 2701
String.replaceSubrange.String 480 494 487
String.replaceSubrange.String.Small 564 580 572
String.replaceSubrange.Substring 1132 1147 1141
String.replaceSubrange.Substring.Small 644 658 652

Code size: -O

Performance: -Osize

Improvement OLD NEW DELTA RATIO
FlattenListLoop 2736 2167 -20.8% 1.26x (?)
Dictionary4 207 165 -20.3% 1.25x
CharacterLiteralsLarge 67 58 -13.4% 1.16x (?)
CharacterLiteralsSmall 220 202 -8.2% 1.09x (?)
 
Added MIN MAX MEAN MAX_RSS
String.replaceSubrange.ArrChar 707 708 708
String.replaceSubrange.ArrChar.Small 555 555 555
String.replaceSubrange.RepChar 17925 18117 18022
String.replaceSubrange.RepChar.Small 2611 2660 2628
String.replaceSubrange.String 488 488 488
String.replaceSubrange.String.Small 575 587 579
String.replaceSubrange.Substring 1134 1151 1140
String.replaceSubrange.Substring.Small 659 671 663

Code size: -Osize

Performance: -Onone

Added MIN MAX MEAN MAX_RSS
String.replaceSubrange.ArrChar 877 931 895
String.replaceSubrange.ArrChar.Small 677 677 677
String.replaceSubrange.RepChar 18854 18953 18909
String.replaceSubrange.RepChar.Small 2857 2949 2916
String.replaceSubrange.String 594 596 595
String.replaceSubrange.String.Small 670 682 674
String.replaceSubrange.Substring 1209 1269 1229
String.replaceSubrange.Substring.Small 727 728 727

Code size: -swiftlibs

Benchmark Check Report
⛔️⏱ String.replaceSubrange.RepChar execution took at least 17589 μs.
Decrease the workload of String.replaceSubrange.RepChar by a factor of 32 (100), to be less than 1000 μs.
⚠️ String.replaceSubrange.RepChar.Small execution took at least 2589 μs.
Decrease the workload of String.replaceSubrange.RepChar.Small by a factor of 4 (10), to be less than 1000 μs.
⚠️ String.replaceSubrange.Substring execution took at least 1047 μs.
Decrease the workload of String.replaceSubrange.Substring by a factor of 2 (10), to be less than 1000 μs.
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@milseman
Copy link
Member

@palimondo if @keitaito fixes the workload size as recommended in the report, are we good to go?

-- --
⛔️⏱ String.replaceSubrange.RepChar execution took at least 17589 μs. Decrease the workload of String.replaceSubrange.RepChar by a factor of 32 (100), to be less than 1000 μs.
⚠️ String.replaceSubrange.RepChar.Small execution took at least 2589 μs. Decrease the workload of String.replaceSubrange.RepChar.Small by a factor of 4 (10), to be less than 1000 μs.
⚠️ String.replaceSubrange.Substring execution took at least 1047 μs. Decrease the workload of String.replaceSubrange.Substring by a factor of 2 (10), to be less than 1000 μs.

@palimondo
Copy link
Contributor

palimondo commented Jul 30, 2019

@milseman If you don't foresee an order of magnitude improvement in the performance of replaceSubrange for the fast cases (String, Substring, …) I'd say we should go with the same loop multiplier for the whole family. It looks like something in 500—1000 range would work fine based on the report above, with RepChar jumping above 1000μs, which is still fine so that we can have direct comparison between all the cases.

Otherwise he should go with your recommendation from before to:

… refactor this to a generic @inline(__always) implementation function that takes a scale factor (because the String version should be much faster than the Repeat version) and does the loop and replaceSubrange calls.

If I understand the point correctly, inlined function will allow the compiler to unroll the loop, which would not be the case if the loop multiplier was passed as parameter to non-inlined generic function? (I don't think we'd need to create per-case functions, as your next sentence said, because the inline closures play the same role and there is nowhere they can get inlined into… I think the ritualistic way we sprinkle @inline(never) is obsolete there.)

In other words, I'd prefer to have same multiplier across all variants, but when variants differ several orders of magnitude, it is not possible to do, we should have properly tailored multiplier for each individual case (DataBenchmarks are done that way).

@milseman
Copy link
Member

milseman commented Aug 2, 2019

Same multiplier sounds good to me if we can make all the benchmarks fit in our time windows

+[Gardening] minor formatting (80 chars)
@palimondo
Copy link
Contributor

@swift-ci please benchmark

@palimondo
Copy link
Contributor

@swift-ci please smoke test

@swift-ci
Copy link
Contributor

swift-ci commented Aug 2, 2019

Performance: -O

Regression OLD NEW DELTA RATIO
SuffixSequence 293 326 +11.3% 0.90x (?)
SuffixSequenceLazy 293 326 +11.3% 0.90x (?)
Chars2 3100 3350 +8.1% 0.93x (?)
 
Improvement OLD NEW DELTA RATIO
FlattenListLoop 2829 2170 -23.3% 1.30x (?)
FlattenListFlatMap 5301 4407 -16.9% 1.20x (?)
 
Added MIN MAX MEAN MAX_RSS
String.replaceSubrange.ArrChar 72 72 72
String.replaceSubrange.ArrChar.Small 57 57 57
String.replaceSubrange.RepChar 1775 2091 1881
String.replaceSubrange.RepChar.Small 267 267 267
String.replaceSubrange.String 53 53 53
String.replaceSubrange.String.Small 60 60 60
String.replaceSubrange.Substring 118 118 118
String.replaceSubrange.Substring.Small 70 70 70

Code size: -O

Performance: -Osize

Improvement OLD NEW DELTA RATIO
Dictionary4 204 166 -18.6% 1.23x
CharacterLiteralsLarge 67 58 -13.4% 1.16x (?)
Dictionary4OfObjects 351 328 -6.6% 1.07x (?)
 
Added MIN MAX MEAN MAX_RSS
String.replaceSubrange.ArrChar 71 73 72
String.replaceSubrange.ArrChar.Small 56 56 56
String.replaceSubrange.RepChar 1787 1998 1862
String.replaceSubrange.RepChar.Small 262 263 263
String.replaceSubrange.String 49 49 49
String.replaceSubrange.String.Small 57 57 57
String.replaceSubrange.Substring 112 112 112
String.replaceSubrange.Substring.Small 65 66 65

Code size: -Osize

Performance: -Onone

Added MIN MAX MEAN MAX_RSS
String.replaceSubrange.ArrChar 99 101 100
String.replaceSubrange.ArrChar.Small 79 81 80
String.replaceSubrange.RepChar 1793 1969 1853
String.replaceSubrange.RepChar.Small 275 275 275
String.replaceSubrange.String 51 51 51
String.replaceSubrange.String.Small 59 61 60
String.replaceSubrange.Substring 117 117 117
String.replaceSubrange.Substring.Small 70 72 71

Code size: -swiftlibs

Benchmark Check Report
⚠️ String.replaceSubrange.RepChar execution took at least 1736 μs.
Decrease the workload of String.replaceSubrange.RepChar by a factor of 2 (10), to be less than 1000 μs.
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@palimondo palimondo merged commit 22d7c28 into swiftlang:master Aug 2, 2019
@palimondo
Copy link
Contributor

@keitaito Thank you for your patience and your contribution!

@keitaito
Copy link
Contributor Author

keitaito commented Aug 4, 2019

Woot! Thank you Pavol for updating the multiplier for me, and thank you for merging the PR! I really appreciate both Pavol and Michael that you patiently and carefully gave me reviews and feedback on my PR. Sorry for taking a long time to get it merged, but I'm happy and excited it's finally merged 🙂 I learned very interesting things under the hood of Swift standard library and String APIs through working on it. Thank you so much!

@keitaito
Copy link
Contributor Author

keitaito commented Aug 4, 2019

I have a question for @milseman about pre-specializations.

Benchmarking a non-pre-specialized argument type helps identify and track changes in the fully-generic implementation.

Does this mean that if @_specialize(where C == Repeated<Character>) is added to replaceSubrange(_:with:), we could see performance improvement for it when the argument is of type Repeated<Character>? Is there any specific reason the pre-specializations annotation is not added unlike String, Substring, and Array<Character> right now?

@keitaito
Copy link
Contributor Author

keitaito commented Aug 4, 2019

Another question: would you mind explaining what's the criteria for choosing @inline(never) vs @inline(__always) annotations? According to Bruno's blog post and Slava's explanation on Swift mailing list, it seems these work like the following:

  • If @inline(never) is used for the given function, the function won't be inlined. In other words, the function won't be copied to the call site. There could be function call overhead.
  • If @inline(__always) is used for the given function, the function will be inlined at the call site. No function call overhead, but if the function is called from multiple call sites, there will be code duplications, and the binary size will increase.

Given these, could inlining affect benchmark performance? Looking at benchmark source code files, it seems @inline(__always) usage is much less than @inline(never). What kind of function could be annotated with @inline(__always)?

@palimondo
Copy link
Contributor

The ritualistic placement of @inline(never) on the tested function predates the public history of Swift Benchmark Suite. I can only guess that it stemmed from some very early internal version of the benchmark where it could have been inlined into the measurement loop? AFAIK it has no practical effect, because the individual benchmark functions are always executed indirectly through a function reference stored in a dictionary – there’s nowhere these functions can be inlined into as far as compiler can see.

For a more targeted use of @inline(never) see DataBenchmarks where it is used to explicitly control the inlining into the runFunction closures to create 2 different variants of benchmarks to showcase the effect of their (Data) inlining. (But frankly I’m not 100% positive I got all that correctly – AFAIK Data family of benchmarks is first one to deal explicitly with that aspect.)

For an example of technically correct @inline(__always) use, see InsertCharacter – where it ensures that the multiple benchmark functions are testing the same thing. But that pattern is not worth emulating, because the same effect can be achieved by leaving out all the annotations and calling the benchmark function with varying parameters from runFunction closure.

So I don’t see a very good reason to use @inline(__always) in context of benchmark definitions. If you were asking generally, see all occurrences of this annotation in stdlib.

@palimondo
Copy link
Contributor

@keitaito If you’re interested in playing some more with this, I think it should be possible and even beneficial — to prevent further confusion — to remove most of the @inline(never) annotations from Swift Benchmark Suite. Another starter task?

No Sign up for free to join this conversation on GitHub. Already have an account? No Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants