[benchmark] Add ReplaceSubrange benchmark #25310

keitaito · 2019-06-07T23:44:17Z

🔨 Changes

Add ReplaceSubrange benchmark. It is still work in progress, but it would be great if I could get feedback to make sure I'm on the right direction 🙇‍♂️

Resolves a part of SR-8905 Gaps in String benchmarking.

Shout-out to Rob and Michael for mentoring at try! Swift San Jose!

To-do list

Arguments of types String
Substring
Array
Repeated

beccadax · 2019-06-07T23:48:25Z

@milseman I'm thinking this is in your wheelhouse.

beccadax · 2019-06-08T04:49:14Z

@swift-ci please smoke test

beccadax · 2019-06-08T04:49:37Z

@swift-ci please benchmark

swift-ci · 2019-06-08T05:15:32Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
CharacterLiteralsLarge	97	108	+11.3%	0.90x

Improvement	OLD	NEW	DELTA	RATIO
ArrayAppendGenericStructs	2290	1340	-41.5%	1.71x (?)
ObjectiveCBridgeStubFromNSStringRef	175	158	-9.7%	1.11x (?)

Added	MIN	MAX	MEAN	MAX_RSS
ReplaceSubrangeWithLargeLiteralString	1899	1941	1921	—
ReplaceSubrangeWithLargeManagedString	1820	1865	1835	—
ReplaceSubrangeWithSmallLiteralString	1994	2070	2024	—

Code size: -O

Performance: -Osize

Regression	OLD	NEW	DELTA	RATIO
CharacterLiteralsLarge	100	111	+11.0%	0.90x (?)
Set.subtracting.Seq.Empty.Box	211	227	+7.6%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
FlattenListLoop	5276	4673	-11.4%	1.13x (?)

Added	MIN	MAX	MEAN	MAX_RSS
ReplaceSubrangeWithLargeLiteralString	1839	1936	1880	—
ReplaceSubrangeWithLargeManagedString	1809	1868	1830	—
ReplaceSubrangeWithSmallLiteralString	1988	2102	2026	—

Code size: -Osize

Performance: -Onone

Regression	OLD	NEW	DELTA	RATIO
ArrayAppendGenericStructs	1380	2220	+60.9%	0.62x (?)

Added	MIN	MAX	MEAN	MAX_RSS
ReplaceSubrangeWithLargeLiteralString	4321	4387	4343	—
ReplaceSubrangeWithLargeManagedString	3748	3805	3769	—
ReplaceSubrangeWithSmallLiteralString	3655	3720	3690	—

Code size: -swiftlibs

✅	Benchmark Check Report
⚠️🔤	`ReplaceSubrangeWithLargeLiteralString` name is composed of 6 words. _{Split ReplaceSubrangeWithLargeLiteralString name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming}
⚠️⏱	`ReplaceSubrangeWithLargeLiteralString` execution took at least 1811 μs. _{Decrease the workload of ReplaceSubrangeWithLargeLiteralString by a factor of 2 (10), to be less than 1000 μs.}
⚠️🔤	`ReplaceSubrangeWithLargeManagedString` name is composed of 6 words. _{Split ReplaceSubrangeWithLargeManagedString name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming}
⚠️⏱	`ReplaceSubrangeWithLargeManagedString` execution took at least 1798 μs. _{Decrease the workload of ReplaceSubrangeWithLargeManagedString by a factor of 2 (10), to be less than 1000 μs.}
⚠️🔤	`ReplaceSubrangeWithSmallLiteralString` name is composed of 6 words. _{Split ReplaceSubrangeWithSmallLiteralString name into dot-separated groups and variants. See http://bit.ly/BenchmarkNaming}
⚠️⏱	`ReplaceSubrangeWithSmallLiteralString` execution took at least 1994 μs. _{Decrease the workload of ReplaceSubrangeWithSmallLiteralString by a factor of 2 (10), to be less than 1000 μs.}

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

beccadax · 2019-06-08T06:52:14Z

@keitaito This is a great start, but can you address the issues listed at the bottom of the benchmark comment, please?

palimondo · 2019-06-10T23:56:47Z

@keitaito I suggest naming these benchmarks per naming convention as

String.replaceSubrange.SmallLiteral
String.replaceSubrange.LargeLiteral
String.replaceSubrange.LargeManaged

and the file as StringReplaceSubrange.swift.

palimondo · 2019-06-11T00:59:37Z

You need to halve the loop multiplier. Mind the 80 character line limit. I would also just create a single testing function and pass the string as parameter. Since it'll be marked as @inline(never), you don't need to use getString to prevent the optimizations. Then you'd use inline closure to define runFunctions. You can initialize the large managed string in setup also with inline closure and blackHole function:

let t: [BenchmarkCategory] = [.validation, .api, .String]

public let StringReplaceSubrange = [
  BenchmarkInfo(name: "String.replaceSubrange.SmallLiteral",
    runFunction: { replaceSubrange($0, "coffee" }, tags: t),
  BenchmarkInfo(name: "String.replaceSubrange.LargeLiteral",
    runFunction: { replaceSubrange($0, "coffee\u{301}coffeecoffeecoffee" },
    tags: t),
  BenchmarkInfo(name: "ReplaceSubrangeWithLargeManagedString",
    runFunction: { replaceSubrange($0, largeManaged) }, tags: t,
    setUpFunction: { blackHole(largeManaged) }),
]

@inline(never)
public func replaceSubrange(_ N: Int, _ string: String) {
  var copy = string
  let range = string.startIndex..<string.index(after: string.startIndex)
  for _ in 0 ..< 5_000*N {
    copy.replaceSubrange(range, with: "t")
  }
}

@milseman Does it make sense to test range replacement with a single character that doesn't change the size? How about something like this?

@inline(never)
public func replaceSubrange(_ N: Int, _ string: String) {
  var s = string
  for _ in 0 ..< 2_500*N {
    let ff =
      s.index(s.startIndex, offsetBy:2)..<s.index(s.startIndex, offsetBy:4)
    s.replaceSubrange(ff, with: "☕️")
    let cup =
      s.index(s.startIndex, offsetBy:2)..<s.index(s.startIndex, offsetBy:3)
    copy.replaceSubrange(cup, with: "ff")
  }
  CheckResults(s == string)
}

Also, since replaceSubrange is mutable operation, how much sense does it make to vary these across literals and managed strings? After the initial modification, wouldn't this differ only in StringGuts(_SmallString or not)? I mean, isn't the LargeLiteral and LargeManaged effectively the same minus the one char difference in length?

palimondo · 2019-06-11T01:05:13Z

@milseman Could you also please verify my reasoning above regarding the getString and shared run function? I'm starting to doubt myself looking at the reported runtimes…

Also, would it make sense to do a variant that would cross the boundary between the small and large representations? Or vary the benchmarks based on the position of the range, i.e. replacing from start, in the middle or just at the end, to stress the reallocations?

keitaito · 2019-06-11T06:05:06Z

@brentdax Thanks Brent for running tests 😄

@palimondo Hi Pavol, thanks for the great feedback! I will start working on easy fixes like re-naming the file and benchmarks, and fixing line length 🔨

Reference: https://github.com/apple/swift/blob/master/benchmark/Naming.md

milseman · 2019-06-11T22:15:08Z

@palimondo

I would also just create a single testing function and pass the string as parameter. Since it'll be marked as @inline(never), you don't need to use getString to prevent the optimizations.

I don't think @inline(never) is sufficient here, but I agree with having a single function that runs over different workloads. You could say:

@inline(never)
public func replaceSubrange(_ N: Int, _ string: String) {
  var copy = getString(string)
  let range = string.startIndex..<string.index(after: string.startIndex)
  for _ in 0 ..< 5_000*N {
    copy.replaceSubrange(range, with: "t")
  }
}

So there's no measurable overhead in the getString call, but it serves as an optimization barrier, should the optimizer ever do more inter-procedural optimizations that are not contingent on prior inlining.

Also, would it make sense to do a variant that would cross the boundary between the small and large representations? Or vary the benchmarks based on the position of the range, i.e. replacing from start, in the middle or just at the end, to stress the reallocations?

I think a benchmark that grows a small string to a large one could be interesting. We don't often go the other way, and we don't shrink the string under normal operations (but keepingCapacity: false, making a brand new one, etc., will form the small string). If a string is already allocated with excess capacity, it's likely the user requested that capacity or are otherwise going to be filling it soon, so there's more harm in deallocating it right away only to re-allocate.

I don't think it matters what that operation is, and append is more straightforward for a micro-benchmark than replaceSubrange.

edit: As for varying the location and triggering reallocations, I think that also would be better done as a separate dedicated benchmark. This benchmark stresses the overhead of the 2-way introspection and dispatch pattern that eventually results in splicing in the underlying bytes.

milseman

This looks good if you can incorporate @palimondo 's feedback.

After thinking about this more, the largeLiteral is likely redundant with largeManaged, because after the very first replaceSubrange() call, it will be managed (due to mutation) for the subsequent 10_000 operations. The large vs small split is nice, though.

…er> arguments

…acter>

Per Michael's feedback (swiftlang#25310 (review)), largeLiteral is likely redundant with largeManaged.

keitaito

Sorry for taking a while to update this PR, but now this PR adds String.replaceSubrange(_:with:) benchmarks with arguments of types String, Substring, Array, and Repeated. Would you mind taking a look at this PR again when you have time?

Also, while I was working on this PR, I've got a few questions related to Swift.String. Would you mind giving me some explanations on them if you have time? 🙇‍♂️

keitaito · 2019-07-02T06:24:44Z

benchmark/single-source/StringReplaceSubrange.swift

+
+private var largeManagedString: String = {
+    return getString("coffee\u{301}coffeecoffeecoffeecoffee")
+}()


\u{301} was suggested to be added to the test string when I paired with Michael. @milseman would you mind reminding me the reason for this? I don't remember it now 😅

I'll guess it was in order for the string to be in particular normalization form. @milseman Do you want to vary the benchmarks also for different normalization forms? SR-8905 doesn't mention that…

I was wondering if the result of testing with the string "coffeécoffeecoffeecoffeecoffee" would be different from the one with "coffeecoffeecoffeecoffeecoffee" 🤔 If there is distinct difference, maybe we could add two benchmarks for the one with the acute accent character and the other one without it?

Grapheme segmentation is relevant to this benchmark, but not normalization (since there's no comparison). The difference is that the precomposed representation is a single scalar per grapheme cluster, while the decomposed (multi-scalar) form is not. The single-scalar one will hit our grapheme breaking fast-paths while the multi-scalar one will call out to ICU. Alternatively, you could use other kinds of multi-scalar graphemes clusters, such as complex emoji. I just mentioned "\u{301}" because you can just stick it after an "e" to get the same effect.

keitaito · 2019-07-02T06:28:15Z

benchmark/single-source/StringReplaceSubrange.swift

+  BenchmarkInfo(
+    name: "Str.replaceSubrange.SmallLiteral.RepeatedChar",
+    runFunction: { replaceSubrange($0, "coffee", with: getRepeatedCharacter(repeatedCharacter)) },
+    tags: tags


The benchmark name "Str.replaceSubrange.SmallLiteral.RepeatedChar" is longer than 40 characters, but I couldn't think a better name fitting 40. Maybe it can be like "Str.replaceSubrange.LargeManagedRepChar", but I was concerned "RepChar" is a little bit hard to understand that it means Repeated<Character>. @palimondo What do you think about this naming? Please let me know if you have any suggestions on it 🙂

Looking at what @milseman writes in SR-8905:

replaceSubrange<C: Collection>(_:C)

Arguments of types String, Substring, Array<Character>, Repeated<Character>, etc

I'd say the naming convention calls for base name of String.replaceSubrange which varies across the argument type (String, Substring, ArrChar, RepChar) for the general case of large strings. Then we'll denote the special optimization for small strings with a simple .Small suffix and we'll get these benchmarks:

String.replaceSubrange.String

String.replaceSubrange.Substring

String.replaceSubrange.ArrChar

String.replaceSubrange.RepChar

String.replaceSubrange.String.Small

String.replaceSubrange.Substring.Small

String.replaceSubrange.ArrChar.Small

String.replaceSubrange.RepChar.Small

The longest one is String.replaceSubrange.Substring.Small at 39 characters, just under the 40 chars limit.

That's an awesome naming idea. I will use them. Thanks for your suggestion!

keitaito · 2019-07-02T06:30:07Z

benchmark/single-source/StringReplaceSubrange.swift

+
+private func setupLargeManagedString() {
+    _ = largeManagedString
+}


Should largeManagedString be called from setupFunction closure before it is used for benchmarks?

Given that in the replaceSubrange(::with:) you already do the var copy = getString(string), this whole dance here is unnecessary. You should declare this as simple let largeString = "coffee\u{301}coffeecoffeecoffeecoffee" and drop all the setUpFunctions.

That makes sense. Thanks for explaining!

keitaito · 2019-07-02T06:33:36Z

benchmark/single-source/StringReplaceSubrange.swift

+  let range = string.startIndex..<string.index(after: string.startIndex)
+  for _ in 0 ..< 5_000 * N {
+    copy.replaceSubrange(range, with: replacingString)
+  }


What is the criteria for choosing this multiplying number like 5000? Does this depend on the benchmark time?

Correct. We are just trying to size the workload to run in 20–1000 μs, so that it is in a measurement sweet spot for our system.

Thanks for explaining!

keitaito · 2019-07-02T06:40:00Z

benchmark/single-source/StringReplaceSubrange.swift

+  BenchmarkInfo(
+    name: "Str.replaceSubrange.SmallLiteral.String",
+    runFunction: { replaceSubrange($0, "coffee", with: "t") },
+    tags: tags


Would you mind explaining what is the difference between the small literal string vs the large managed string? Is this something related to this small string optimization? If the string fits 15 ASCII characters length, it won't be allocated in the heap memory?

Correct. See: _SmallString and _StringGuts for implementation details if you're interested.

Thanks for the links! I will take a look at them 🔍

Small form accommodates 15 UTF-8 code units in length (not just ASCII)

Thank you for clarifying the length of the small string, Michael 😄

palimondo · 2019-07-02T16:03:40Z

@swift-ci please benchmark

palimondo

I think one generic test method should be able to over all of our test cases. See inline comments.

palimondo · 2019-07-02T17:08:56Z

benchmark/single-source/StringReplaceSubrange.swift

+
+private func setupLargeManagedString() {
+    _ = largeManagedString
+}


Given that in the replaceSubrange(::with:) you already do the var copy = getString(string), this whole dance here is unnecessary. You should declare this as simple let largeString = "coffee\u{301}coffeecoffeecoffeecoffee" and drop all the setUpFunctions.

palimondo · 2019-07-02T17:30:14Z

benchmark/single-source/StringReplaceSubrange.swift

+  ),
+  BenchmarkInfo(
+    name: "Str.replaceSubrange.SmallLiteral.Substr",
+    runFunction: { replaceSubrange($0, "coffee", with: getSubstring("t")) },


We don't need to put an optimization barrier (by calling the getSubstring from TestUtils) here.
Shorter way to get a Substring is to get a full subrange like this: "t"[...].

For an implementation symmetry, I'd also extract the "coffee" into smallString constant, so that our benchmark definitions would vary only like this:

runFunction: { replaceSubrange($0, largeString, with: "t") }

runFunction: { replaceSubrange($0, smallString, with: "t") }

runFunction: { replaceSubrange($0, largeString, with: "t"[...]) }

runFunction: { replaceSubrange($0, smallString, with: "t"[...]) }

palimondo · 2019-07-02T17:35:09Z

benchmark/utils/TestsUtils.swift

@@ -322,3 +322,11 @@ public func getString(_ s: String) -> String { return s }
 // The same for Substring.
 @inline(never)
 public func getSubstring(_ s: Substring) -> Substring { return s }
+


I don't think we need to define any new optimization barrier functions...

palimondo

This review comment got somehow lost… Let me try again.

palimondo · 2019-07-02T17:53:38Z

benchmark/single-source/StringReplaceSubrange.swift

+}
+
+@inline(never)
+private func replaceSubrange(_ N: Int, _ string: String, with replacingSubstring: Substring) {


If I understand @milseman's intention from SR-8905 correctly, we are designing benchmark for the generic replaceSubstring method that varies across the String, Substring, Array<Character> and Repeat<Character> specializations.

Therefore we should be able to define single shared generic test function and vary the parameter in runFunction closure in BenchmarkInfo declarations.

For an example of such benchmarks, see append variants in DataBenchmarks

@milseman Any thoughts on keeping or dropping the @inline(never) annotation?

You could refactor this to a generic @inline(__always) implementation function that takes a scale factor (because the String version should be much faster than the Repeat version) and does the loop and replaceSubrange calls. You'd likely want to make a @inline(never) top-level function for each individual benchmark. It doesn't save you a whole lot.

keitaito · 2019-07-04T18:40:28Z

@palimondo Thank you so much for another review, Pavol! I will address your comments and fix warnings and errors from the swift-ci benchmark check report 🔨

keitaito · 2019-07-07T19:43:03Z

Several build failure errors are suddenly thrown when running ninja swift-benchmark-macosx-x86_64 😢 Have you seen these errors? Do you have any ideas how to fix them and re-build swift-benchmark?

/[my-own-path]/swift-source/build/Ninja-RelWithDebInfoAssert/swift-macosx-x86_64 $ ninja swift-benchmark-macosx-x86_64

swift-source/swift/stdlib/public/stubs/MathStubs.cpp:21:10: fatal error: 'climits' file not found
swift-source/swift/include/swift/Runtime/Debug.h:20:10: fatal error: 'cstdarg' file not found
swift-source/swift/stdlib/public/stubs/CommandLine.cpp:17:10: fatal error: 'vector' file not found
swift-source/swift/stdlib/public/stubs/LibcShims.cpp:30:10: fatal error: 'type_traits' file not found
swift-source/swift/include/swift/Runtime/Debug.h:20:10: fatal error: 'cstdarg' file not found
swift-source/swift/stdlib/public/stubs/ThreadLocalStorage.cpp:13:10: fatal error: 'cstring' file not found
swift-source/swift/stdlib/public/stubs/Stubs.cpp:37:10: fatal error: 'climits' file not found
swift-source/swift/stdlib/public/stubs/../SwiftShims/RefCount.h:30:10: fatal error: 'type_traits' file not found
swift-source/swift/include/swift/Basic/Lazy.h:16:10: fatal error: 'memory' file not found

swift-source/swift/stdlib/public/runtime/ImageInspection.h:25:10: fatal error: 'cstdint' file not found
swift-source/swift/include/swift/ABI/Metadata.h:20:10: fatal error: 'atomic' file not found
swift-source/llvm/include/llvm/Support/Compiler.h:20:10: fatal error: 'new' file not found
swift-source/swift/include/swift/Runtime/HeapObject.h:20:10: fatal error: 'cstddef' file not found
swift-source/swift/include/swift/Basic/Range.h:38:10: fatal error: 'algorithm' file not found
swift-source/swift/include/swift/Demangling/Demangle.h:22:10: fatal error: 'memory' file not found
swift-source/swift/include/swift/Basic/Lazy.h:16:10: fatal error: 'memory' file not found
swift-source/swift/include/swift/Basic/Lazy.h:16:10: fatal error: 'memory' file not found
swift-source/swift/include/swift/ABI/Metadata.h:20:10: fatal error: 'atomic' file not found
swift-source/swift/include/swift/ABI/Metadata.h:20:10: fatal error: 'atomic' file not found

A lot of error messages are displayed on Terminal. Here is an example of a build failure error.

[1/743] Building CXX object stdlib/public/runtime/CMakeFiles/swiftRuntime-macosx-x86_64.dir/Enum.cpp.o
FAILED: stdlib/public/runtime/CMakeFiles/swiftRuntime-macosx-x86_64.dir/Enum.cpp.o 
/[my-own-path]/swift-source/build/Ninja-RelWithDebInfoAssert/llvm-macosx-x86_64/./bin/clang++  -DCMARK_STATIC_DEFINE -DGTEST_HAS_RTTI=0 -D_DEBUG -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Istdlib/public/runtime -I/[my-own-path]/swift-source/swift/stdlib/public/runtime -Iinclude -I/[my-own-path]/swift-source/swift/include -I/[my-own-path]/swift-source/llvm/include -I/[my-own-path]/swift-source/build/Ninja-RelWithDebInfoAssert/llvm-macosx-x86_64/include -I/[my-own-path]/swift-source/llvm/tools/clang/include -I/[my-own-path]/swift-source/build/Ninja-RelWithDebInfoAssert/llvm-macosx-x86_64/tools/clang/include -I/[my-own-path]/swift-source/cmark/src -I/[my-own-path]/swift-source/build/Ninja-RelWithDebInfoAssert/cmark-macosx-x86_64/src -Wno-unknown-warning-option -Werror=unguarded-availability-new -fno-stack-protector -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wimplicit-fallthrough -Wcovered-switch-default -Wno-class-memaccess -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -Werror=switch -Wdocumentation -Wimplicit-fallthrough -Wunreachable-code -Woverloaded-virtual -DOBJC_OLD_DISPATCH_PROTOTYPES=0 -fno-sanitize=all -DLLVM_DISABLE_ABI_BREAKING_CHECKS_ENFORCING=1 -O2  -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk   -UNDEBUG  -fno-exceptions -fno-rtti -Wall -Wglobal-constructors -Wexit-time-destructors -fvisibility=hidden -DSWIFT_RUNTIME_CLOBBER_FREED_OBJECTS=1 -DswiftCore_EXPORTS -I/[my-own-path]/swift-source/swift/include -DSWIFT_TARGET_LIBRARY_NAME=swiftRuntime -target x86_64-apple-macosx10.9 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk -arch x86_64 -F /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/../../../Developer/Library/Frameworks -mmacosx-version-min=10.9 -O2 -g -UNDEBUG -DSWIFT_ENABLE_RUNTIME_FUNCTION_COUNTERS -MD -MT stdlib/public/runtime/CMakeFiles/swiftRuntime-macosx-x86_64.dir/Enum.cpp.o -MF stdlib/public/runtime/CMakeFiles/swiftRuntime-macosx-x86_64.dir/Enum.cpp.o.d -o stdlib/public/runtime/CMakeFiles/swiftRuntime-macosx-x86_64.dir/Enum.cpp.o -c /[my-own-path]/swift-source/swift/stdlib/public/runtime/Enum.cpp
In file included from /[my-own-path]/swift-source/swift/stdlib/public/runtime/Enum.cpp:17:
In file included from /[my-own-path]/swift-source/llvm/include/llvm/Support/ErrorHandling.h:18:
/[my_path]swift-source/llvm/include/llvm/Support/Compiler.h:20:10: fatal error: 'new' file not found
#include <new>
         ^~~~~
1 error generated.

I'm not sure if it's related, but these errors started being thrown after Xcode 11 beta 3 was installed on my machine, FWIW.

palimondo · 2019-07-08T10:09:24Z

I'd guess you need to do a clean build after Xcode update.

- Re-name benchmark names - Remove setupLargeManagedString() - Remove unnecessary optimization barrier functions - Update benchmark argument string by using smallString and largeString - Remove unnecessary optimization barrier function calls - Update replaceSubrange(_:_:with) to be generic function

keitaito · 2019-07-28T18:44:38Z

@palimondo Thanks for the reply on the build issue, Pavol! I ended up re-building everything from scratch 😅

keitaito · 2019-07-28T19:38:37Z

The benchmark with Repeated<Character> takes much longer time than others. Do you have any idea why this type is slower than others? Is there any way to improve this benchmark? The benchmark result on my machine is the following:

// When the for-in loop runs with `0 ..< 5_000 * N`
$ ./Benchmark_O 772 773 774 775 776 777 778 779
#,TEST,SAMPLES,MIN(μs),MAX(μs),MEAN(μs),SD(μs),MEDIAN(μs)
772,String.replaceSubrange.ArrChar,1,2049,2049,2049,0,2049
773,String.replaceSubrange.ArrChar.Small,1,1240,1240,1240,0,1240
774,String.replaceSubrange.RepChar,1,31690,31690,31690,0,31690
775,String.replaceSubrange.RepChar.Small,1,4670,4670,4670,0,4670
776,String.replaceSubrange.String,1,1105,1105,1105,0,1105
777,String.replaceSubrange.String.Small,1,1233,1233,1233,0,1233
778,String.replaceSubrange.Substring,1,2771,2771,2771,0,2771
779,String.replaceSubrange.Substring.Small,1,1477,1477,1477,0,1477

Total performance tests executed: 8

// When the for-in loop runs with `0 ..< 250 * N`
$ ./Benchmark_O 772 773 774 775 776 777 778 779
#,TEST,SAMPLES,MIN(μs),MAX(μs),MEAN(μs),SD(μs),MEDIAN(μs)
772,String.replaceSubrange.ArrChar,1,92,92,92,0,92
773,String.replaceSubrange.ArrChar.Small,1,54,54,54,0,54
774,String.replaceSubrange.RepChar,1,1209,1209,1209,0,1209
775,String.replaceSubrange.RepChar.Small,1,186,186,186,0,186
776,String.replaceSubrange.String,1,54,54,54,0,54
777,String.replaceSubrange.String.Small,1,52,52,52,0,52
778,String.replaceSubrange.Substring,1,122,122,122,0,122
779,String.replaceSubrange.Substring.Small,1,63,63,63,0,63

Total performance tests executed: 8

milseman · 2019-07-30T00:09:33Z

@keitaito there are explicit pre-specializations for the others: https://github.com/apple/swift/blob/master/stdlib/public/core/StringRangeReplaceableCollection.swift#L197

Benchmarking a non-pre-specialized argument type helps identify and track changes in the fully-generic implementation.

palimondo · 2019-07-30T11:06:00Z

@swift-ci please benchmark

palimondo · 2019-07-30T11:10:01Z

I’m running the benchmarks on CI with your current multiplier (5000), to get a relation to your local results.

swift-ci · 2019-07-30T11:32:47Z

Performance: -O

Improvement	OLD	NEW	DELTA	RATIO
FlattenListFlatMap	5939	4087	-31.2%	1.45x (?)
SuffixSequenceLazy	319	289	-9.4%	1.10x (?)
SuffixSequence	317	291	-8.2%	1.09x (?)

Added	MIN	MAX	MEAN	MAX_RSS
String.replaceSubrange.ArrChar	714	740	729	—
String.replaceSubrange.ArrChar.Small	568	580	574	—
String.replaceSubrange.RepChar	18516	18770	18664	—
String.replaceSubrange.RepChar.Small	2657	2779	2701	—
String.replaceSubrange.String	480	494	487	—
String.replaceSubrange.String.Small	564	580	572	—
String.replaceSubrange.Substring	1132	1147	1141	—
String.replaceSubrange.Substring.Small	644	658	652	—

Code size: -O

Performance: -Osize

Improvement	OLD	NEW	DELTA	RATIO
FlattenListLoop	2736	2167	-20.8%	1.26x (?)
Dictionary4	207	165	-20.3%	1.25x
CharacterLiteralsLarge	67	58	-13.4%	1.16x (?)
CharacterLiteralsSmall	220	202	-8.2%	1.09x (?)

Added	MIN	MAX	MEAN	MAX_RSS
String.replaceSubrange.ArrChar	707	708	708	—
String.replaceSubrange.ArrChar.Small	555	555	555	—
String.replaceSubrange.RepChar	17925	18117	18022	—
String.replaceSubrange.RepChar.Small	2611	2660	2628	—
String.replaceSubrange.String	488	488	488	—
String.replaceSubrange.String.Small	575	587	579	—
String.replaceSubrange.Substring	1134	1151	1140	—
String.replaceSubrange.Substring.Small	659	671	663	—

Code size: -Osize

Performance: -Onone

Added	MIN	MAX	MEAN	MAX_RSS
String.replaceSubrange.ArrChar	877	931	895	—
String.replaceSubrange.ArrChar.Small	677	677	677	—
String.replaceSubrange.RepChar	18854	18953	18909	—
String.replaceSubrange.RepChar.Small	2857	2949	2916	—
String.replaceSubrange.String	594	596	595	—
String.replaceSubrange.String.Small	670	682	674	—
String.replaceSubrange.Substring	1209	1269	1229	—
String.replaceSubrange.Substring.Small	727	728	727	—

Code size: -swiftlibs

✅	Benchmark Check Report
⛔️⏱	`String.replaceSubrange.RepChar` execution took at least 17589 μs. _{Decrease the workload of String.replaceSubrange.RepChar by a factor of 32 (100), to be less than 1000 μs.}
⚠️⏱	`String.replaceSubrange.RepChar.Small` execution took at least 2589 μs. _{Decrease the workload of String.replaceSubrange.RepChar.Small by a factor of 4 (10), to be less than 1000 μs.}
⚠️⏱	`String.replaceSubrange.Substring` execution took at least 1047 μs. _{Decrease the workload of String.replaceSubrange.Substring by a factor of 2 (10), to be less than 1000 μs.}

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

milseman · 2019-07-30T18:56:35Z

@palimondo if @keitaito fixes the workload size as recommended in the report, are we good to go?

--	--
⛔️⏱	String.replaceSubrange.RepChar execution took at least 17589 μs. Decrease the workload of String.replaceSubrange.RepChar by a factor of 32 (100), to be less than 1000 μs.
⚠️⏱	String.replaceSubrange.RepChar.Small execution took at least 2589 μs. Decrease the workload of String.replaceSubrange.RepChar.Small by a factor of 4 (10), to be less than 1000 μs.
⚠️⏱	String.replaceSubrange.Substring execution took at least 1047 μs. Decrease the workload of String.replaceSubrange.Substring by a factor of 2 (10), to be less than 1000 μs.

palimondo · 2019-07-30T19:30:39Z

@milseman If you don't foresee an order of magnitude improvement in the performance of replaceSubrange for the fast cases (String, Substring, …) I'd say we should go with the same loop multiplier for the whole family. It looks like something in 500—1000 range would work fine based on the report above, with RepChar jumping above 1000μs, which is still fine so that we can have direct comparison between all the cases.

Otherwise he should go with your recommendation from before to:

… refactor this to a generic @inline(__always) implementation function that takes a scale factor (because the String version should be much faster than the Repeat version) and does the loop and replaceSubrange calls.

If I understand the point correctly, inlined function will allow the compiler to unroll the loop, which would not be the case if the loop multiplier was passed as parameter to non-inlined generic function? (I don't think we'd need to create per-case functions, as your next sentence said, because the inline closures play the same role and there is nowhere they can get inlined into… I think the ritualistic way we sprinkle @inline(never) is obsolete there.)

In other words, I'd prefer to have same multiplier across all variants, but when variants differ several orders of magnitude, it is not possible to do, we should have properly tailored multiplier for each individual case (DataBenchmarks are done that way).

milseman · 2019-08-02T20:23:53Z

Same multiplier sounds good to me if we can make all the benchmarks fit in our time windows

+[Gardening] minor formatting (80 chars)

palimondo · 2019-08-02T20:57:00Z

@swift-ci please benchmark

palimondo · 2019-08-02T20:57:30Z

@swift-ci please smoke test

swift-ci · 2019-08-02T21:17:35Z

Performance: -O

Regression	OLD	NEW	DELTA	RATIO
SuffixSequence	293	326	+11.3%	0.90x (?)
SuffixSequenceLazy	293	326	+11.3%	0.90x (?)
Chars2	3100	3350	+8.1%	0.93x (?)

Improvement	OLD	NEW	DELTA	RATIO
FlattenListLoop	2829	2170	-23.3%	1.30x (?)
FlattenListFlatMap	5301	4407	-16.9%	1.20x (?)

Added	MIN	MAX	MEAN	MAX_RSS
String.replaceSubrange.ArrChar	72	72	72	—
String.replaceSubrange.ArrChar.Small	57	57	57	—
String.replaceSubrange.RepChar	1775	2091	1881	—
String.replaceSubrange.RepChar.Small	267	267	267	—
String.replaceSubrange.String	53	53	53	—
String.replaceSubrange.String.Small	60	60	60	—
String.replaceSubrange.Substring	118	118	118	—
String.replaceSubrange.Substring.Small	70	70	70	—

Code size: -O

Performance: -Osize

Improvement	OLD	NEW	DELTA	RATIO
Dictionary4	204	166	-18.6%	1.23x
CharacterLiteralsLarge	67	58	-13.4%	1.16x (?)
Dictionary4OfObjects	351	328	-6.6%	1.07x (?)

Added	MIN	MAX	MEAN	MAX_RSS
String.replaceSubrange.ArrChar	71	73	72	—
String.replaceSubrange.ArrChar.Small	56	56	56	—
String.replaceSubrange.RepChar	1787	1998	1862	—
String.replaceSubrange.RepChar.Small	262	263	263	—
String.replaceSubrange.String	49	49	49	—
String.replaceSubrange.String.Small	57	57	57	—
String.replaceSubrange.Substring	112	112	112	—
String.replaceSubrange.Substring.Small	65	66	65	—

Code size: -Osize

Performance: -Onone

Added	MIN	MAX	MEAN	MAX_RSS
String.replaceSubrange.ArrChar	99	101	100	—
String.replaceSubrange.ArrChar.Small	79	81	80	—
String.replaceSubrange.RepChar	1793	1969	1853	—
String.replaceSubrange.RepChar.Small	275	275	275	—
String.replaceSubrange.String	51	51	51	—
String.replaceSubrange.String.Small	59	61	60	—
String.replaceSubrange.Substring	117	117	117	—
String.replaceSubrange.Substring.Small	70	72	71	—

Code size: -swiftlibs

✅	Benchmark Check Report
⚠️⏱	`String.replaceSubrange.RepChar` execution took at least 1736 μs. _{Decrease the workload of String.replaceSubrange.RepChar by a factor of 2 (10), to be less than 1000 μs.}

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

palimondo · 2019-08-02T23:04:58Z

@keitaito Thank you for your patience and your contribution!

keitaito · 2019-08-04T18:49:00Z

Woot! Thank you Pavol for updating the multiplier for me, and thank you for merging the PR! I really appreciate both Pavol and Michael that you patiently and carefully gave me reviews and feedback on my PR. Sorry for taking a long time to get it merged, but I'm happy and excited it's finally merged 🙂 I learned very interesting things under the hood of Swift standard library and String APIs through working on it. Thank you so much!

keitaito · 2019-08-04T18:54:43Z

I have a question for @milseman about pre-specializations.

Benchmarking a non-pre-specialized argument type helps identify and track changes in the fully-generic implementation.

Does this mean that if @_specialize(where C == Repeated<Character>) is added to replaceSubrange(_:with:), we could see performance improvement for it when the argument is of type Repeated<Character>? Is there any specific reason the pre-specializations annotation is not added unlike String, Substring, and Array<Character> right now?

keitaito · 2019-08-04T20:17:55Z

Another question: would you mind explaining what's the criteria for choosing @inline(never) vs @inline(__always) annotations? According to Bruno's blog post and Slava's explanation on Swift mailing list, it seems these work like the following:

If @inline(never) is used for the given function, the function won't be inlined. In other words, the function won't be copied to the call site. There could be function call overhead.
If @inline(__always) is used for the given function, the function will be inlined at the call site. No function call overhead, but if the function is called from multiple call sites, there will be code duplications, and the binary size will increase.

Given these, could inlining affect benchmark performance? Looking at benchmark source code files, it seems @inline(__always) usage is much less than @inline(never). What kind of function could be annotated with @inline(__always)?

palimondo · 2019-08-05T06:37:23Z

The ritualistic placement of @inline(never) on the tested function predates the public history of Swift Benchmark Suite. I can only guess that it stemmed from some very early internal version of the benchmark where it could have been inlined into the measurement loop? AFAIK it has no practical effect, because the individual benchmark functions are always executed indirectly through a function reference stored in a dictionary – there’s nowhere these functions can be inlined into as far as compiler can see.

For a more targeted use of @inline(never) see DataBenchmarks where it is used to explicitly control the inlining into the runFunction closures to create 2 different variants of benchmarks to showcase the effect of their (Data) inlining. (But frankly I’m not 100% positive I got all that correctly – AFAIK Data family of benchmarks is first one to deal explicitly with that aspect.)

For an example of technically correct @inline(__always) use, see InsertCharacter – where it ensures that the multiple benchmark functions are testing the same thing. But that pattern is not worth emulating, because the same effect can be achieved by leaving out all the annotations and calling the benchmark function with varying parameters from runFunction closure.

So I don’t see a very good reason to use @inline(__always) in context of benchmark definitions. If you were asking generally, see all occurrences of this annotation in stdlib.

palimondo · 2019-08-05T06:49:35Z

@keitaito If you’re interested in playing some more with this, I think it should be possible and even beneficial — to prevent further confusion — to remove most of the @inline(never) annotations from Swift Benchmark Suite. Another starter task?

Add ReplaceSubrange benchmark

0f6d56c

Update ReplaceSubrange benchmark by adding Substring case

e67c8b1

keitaito added 2 commits June 11, 2019 00:03

Rename ReplaceSubrange to StringReplaceSubrange

459861b

Rename benchmarks based on the benchmark naming convention

4d86f3f

Reference: https://github.com/apple/swift/blob/master/benchmark/Naming.md

milseman reviewed Jun 11, 2019

View reviewed changes

keitaito added 4 commits June 11, 2019 23:33

Reuse replaceSubrange(_:_:with:) function for benchmarks

082285f

Add benchmarks for String.replaceSubrange(_:with:) with Array<Charact…

0de61f0

…er> arguments

Add benchmarks for String.replaceSubrange(_:with:) with Repeated<Char…

4c0ea56

…acter>

Remove benchmarks with largeLiteral

b82f6c5

Per Michael's feedback (swiftlang#25310 (review)), largeLiteral is likely redundant with largeManaged.

keitaito commented Jul 2, 2019

View reviewed changes

keitaito marked this pull request as ready for review July 2, 2019 06:43

This comment has been minimized.

No Sign in to view

palimondo suggested changes Jul 2, 2019

View reviewed changes

[benchmark] String.replaceSubrange: multiplier fix

a9cbe2b

+[Gardening] minor formatting (80 chars)

palimondo approved these changes Aug 2, 2019

View reviewed changes

palimondo merged commit 22d7c28 into swiftlang:master Aug 2, 2019

milseman mannequin mentioned this pull request Jun 10, 2019

[SR-8905] Gaps in String benchmarking #51411

Open

[benchmark] Add ReplaceSubrange benchmark #25310

[benchmark] Add ReplaceSubrange benchmark #25310

Conversation

keitaito commented Jun 7, 2019 • edited Loading

🔨 Changes

To-do list

beccadax commented Jun 7, 2019

beccadax commented Jun 8, 2019

beccadax commented Jun 8, 2019

swift-ci commented Jun 8, 2019

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

beccadax commented Jun 8, 2019

palimondo commented Jun 10, 2019

palimondo commented Jun 11, 2019 • edited Loading

palimondo commented Jun 11, 2019 • edited Loading

keitaito commented Jun 11, 2019

milseman commented Jun 11, 2019 • edited Loading

milseman left a comment

Choose a reason for hiding this comment

keitaito left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keitaito Jul 4, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

palimondo commented Jul 2, 2019

This comment has been minimized.

palimondo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

palimondo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keitaito commented Jul 4, 2019

keitaito commented Jul 7, 2019 • edited Loading

palimondo commented Jul 8, 2019

keitaito commented Jul 28, 2019

keitaito commented Jul 28, 2019

milseman commented Jul 30, 2019

palimondo commented Jul 30, 2019

palimondo commented Jul 30, 2019

swift-ci commented Jul 30, 2019

Performance: -O

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

milseman commented Jul 30, 2019

palimondo commented Jul 30, 2019 • edited Loading

milseman commented Aug 2, 2019

palimondo commented Aug 2, 2019

palimondo commented Aug 2, 2019

swift-ci commented Aug 2, 2019

Performance: -O

Code size: -O

Performance: -Osize

keitaito commented Jun 7, 2019 •

edited

Loading

palimondo commented Jun 11, 2019 •

edited

Loading

palimondo commented Jun 11, 2019 •

edited

Loading

milseman commented Jun 11, 2019 •

edited

Loading

keitaito Jul 4, 2019 •

edited

Loading

keitaito commented Jul 7, 2019 •

edited

Loading

palimondo commented Jul 30, 2019 •

edited

Loading