C# Performance

Me

Steve Love


@IAmSteveLove@mastodon.social
@IAmSteveLove


Don’t…​

  • Work with children or animals

  • Offer to present on C# performance at a C++ conference

  • Definitely don’t attempt live coding

A New Path

  • Optimization is hard

  • A little learning…​

Drink deep, or taste not the Pierian Spring

There shallow draughts intoxicate the brain, And drinking largely sobers us again.

Talking of misquotes…​

The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.

Computer Programming as an Art: ACM Turing Award Lecture 1974
— Donald Knuth

Some waypoints

  • A few misconceptions

  • Measuring performance in the wild

  • What do equality comparisons have to do with pool anyway?

In the Beginning…​

Well, when things got interesting…​

  • C++ is too complicated

  • Java is too simplistic

C# Criticisms

  • GC

  • Boxing

  • Copying

  • Reflection

Garbage Collection

  • …​ is unpredictable

  • …​ takes a long time

  • …​ interrupts program flow

But

  • deterministic destruction costs

  • value types aren’t (individually) collected

Choices, choices

CONSIDER defining a struct instead of a class if instances of the type are small and commonly short-lived or are commonly embedded in other objects.

— Microsoft documentation: Framework design guidelines

Boxing

C# string

More like std::string_view
than std::string

Reduce boxing

var id = Guid.NewGuid();
//...
Console.WriteLine("Id is {0}", id);

vs.

var id = Guid.NewGuid();
//...
Console.WriteLine("Id is {0}", id.ToString());

Equal Rites

string again…​"value-like"

Copied by-reference:

var original = "A string";
var copy = original;

Assert.That(original, Is.SameAs(copy));

Compared by value:

var buf = new StringBuilder("A string");
copy = buf.ToString();

Assert.That(original, Is.Not.SameAs(copy));
Assert.That(original, Is.EqualTo(copy));

Equal Rites Redux

Equals method

More to equality comparisons than Equals

A Simple Type

public readonly struct Colour
{
    public int Red { get; init; }
    public int Green { get; init; }
    public int Blue { get; init; }
}

A hash of it

[Test, Category("Performance")]
public void CreateUniqueSet()
{
    var rng = new Random(210);
    var unique = Enumerable.Range(0, 10_000)
        .Select(_ => rng.Next())
        .Select(r => new Colour { 
            Red = (byte)(r >> 16), 
            Green = (byte)(r >> 8), 
            Blue = (byte)r })
        .ToHashSet();

    Assert.That(unique, Is.Not.Empty);
}

Testing times

unittest1

Or

dotnet test -v Normal
...
NUnit Adapter 4.5.0.0: Test execution complete
  Passed CreateUniqueSet [17 ms]

Spot the difference

public readonly struct Colour
{
    public byte Red { get; init; }
    public byte Green { get; init; }
    public byte Blue { get; init; }
}

Re-run

unittest2

Where did the time go?

sampling

Overworked

tracing

Revert and re-run

tracing2

The problem isn’t really with Equals…​
…​or with byte fields

ValueType.GetHashCode

(in the worst case with byte fields)

  • considers only the first non-null field

  • means a maximum of 256 unique codes (buckets)

Hashing

1

2

1 comparison

Hashing

12

3

2 more comparions (3 total)

Hashing

123

4

3 more comparions (6 total)

Hashing

1234

Next addition 4 more comparisons
(10 now)

Triangles

1 n = 1
23 n = 2
456 n = 3
78910 n = 4
…​

(n2 + n) / 2

And the answer is?

10,000 Colour objects
49,968,936 calls to Equals

10,0002 + 10,000 / 2 =
50,005,000

Hash codes

…​don’t have to be complicated.

public override int GetHashCode()
{
    return (Red << 16) | (Green << 8) | Blue;
}

HashCode.Combine(Red, Green, Blue) also works, and is efficient.

public override bool Equals(object? obj)
{
    return obj is Colour other &&
           Red == other.Red && Green == other.Green && Blue == other.Blue;
}

Upping the ante

unittest3

Boxing now noticeable

tracing3

An equatable solution

public readonly struct Colour : IEquatable<Colour>
{
    public int Red { get; init; }
    public int Green { get; init; }
    public int Blue { get; init; }

    public bool Equals(Colour other)
    {
        return Red == other.Red && Green == other.Green && Blue == other.Blue;
    }

    public override bool Equals(object? obj)
    {
        return obj is Colour other && Equals(other);
    }

    public override int GetHashCode()
    {
        return (Red << 16) | (Green << 8) | Blue;
    }
}

Or more modernly…​

Since C# v12.0 (.NET 8)

public readonly struct Colour(int r, int g, int b) : IEquatable<Colour>
{
    public int Red { get; init; } = r;
    public int Green { get; init; } = g;
    public int Blue { get; init; } = b;

    public bool Equals(Colour other)
    {
        return Red == other.Red && Green == other.Green && Blue == other.Blue;
    }

    public override bool Equals(object? obj)
    {
        return obj is Colour other && Equals(other);
    }

    public override int GetHashCode()
    {
        return (Red << 16) | (Green << 8) | Blue;
    }
}

And even more compactly…​

Since C# v9.0 (.NET 5)

public sealed record Colour(int Red, int Green, int Blue);

or, since C# v10.0 (.NET 6)

public readonly record struct Colour(int Red, int Green, int Blue);

Benchmarks

  • More accurate timings than profiling

  • Less fine-grained results

The competition

  • Simple struct with overridden virtuals (Equals & GetHashCode)

  • Struct implementing IEquatable<T>

  • Struct implementing IEquatable<T> using Primary Constructor

  • Class implementing IEquatable<T>

  • Record

  • Record struct

The results

Create a HashSet from 10,000 objects

| Method     | Case                                  | Mean     | Error    | StdDev  |
|----------- |-------------------------------------- |---------:|---------:|--------:|
| CreateHash | [Shared.Types.EquatableColourClass]   | 232.7 us | 10.67 us | 0.59 us |
| CreateHash | [Shared.Types.EquatableColour]        | 135.0 us | 26.52 us | 1.45 us |
| CreateHash | [Shared.Types.EquatablePrimaryColour] | 134.6 us | 55.93 us | 3.07 us |
| CreateHash | [Shared.Types.OverrideColour]         | 123.4 us | 28.58 us | 1.57 us |
| CreateHash | [Shared.Types.RecordColour]           | 227.0 us | 49.61 us | 2.72 us |
| CreateHash | [Shared.Types.RecordStructColour]     | 128.6 us | 83.39 us | 4.57 us |

Common criticisms of C#

Predictable vs. Fast

  • Managed runtime

  • JIT vs. AOT

  • GC (again)

Here’s fun

using int_type = std::int32_t;

struct colour
{
    explicit colour(int_type red, int_type green, int_type blue)
        : r { red }, g { green }, b { blue } {}

    int_type red() const { return r; }
    int_type green() const { return g; }
    int_type blue() const { return b; }

    bool operator==(const colour & other) const {
        return r == other.r && g == other.g && b == other.b;
    }

private:
    int_type r, g, b;
};

C++ sequence of Colour

std::vector<colour> create_seeds(int N)
{
    std::default_random_engine gen(210);
    std::uniform_int_distribution<int_type> dist(0);

    auto seeds = std::views::iota(0, N)
        | std::views::transform([&](int _) { return dist(gen); })
        | std::views::transform([](int_type r)
            { return colour(r >> 16 & 0xFF, r >> 8 & 0xFF, r & 0xFF); });

    return { std::begin(seeds), std::end(seeds) };
}

Not testing ranges - already been done:

Set the scene

const int N = 10000;
std::vector<colour> colours;

void BM_create_global_state(const benchmark::State& state)
{
    colours = create_seeds(N);
}

struct colour_hash
{
    std::size_t operator()(const colour & col) const
    {
        return col.red() << 16 | col.green() << 8 | col.blue();
    }
};

Google Benchmark

void BM_create_hashmap(benchmark::State& state)
{
    for (const auto& _ : state)
    {
        auto result = std::unordered_set<colour, colour_hash>{ colours.cbegin(), colours.cend() };
        auto count = result.size();

        benchmark::DoNotOptimize(count);
    }
}

BENCHMARK(BM_create_hashmap)
    ->Setup(BM_create_global_state)
    ->Unit(benchmark::kMicrosecond);

C++ results

cppbench

unordered_map<T> much faster on lookups

Don’t assume

  • Measure

  • Measure again!

  • Worst performance culprits - always the programmers