New blog

August 28, 2016 – 11:35 pm

I’m (very) slowly transitioning all my posts to Hugo. The first batch is here for the time being. I’ll keep adding older content, but it might take a while (conversion itself is automatic, but I try to proof read them to make sure it all went fine, it’s 250+ posts). Eventually blog2 will just replace this URL (probably some time next week), all the new content will be done using Hugo.

Elo vs Glicko

August 21, 2016 – 1:26 am

If you’ve ever done any matchmaking coding (…or played chess) you’re most likely familiar with the Elo rating system (PSA: it’s Elo, not ELO, it’s not an acronym, it’s named after a person – Arpad Elo). It’s a rating system used by most chess federations (most notably, FIDE) and has been adapted by many video games. It’s trivial to implement and get up and running, but suffers from a few issues:

  • it’s just one number, there’s no reliability (or deviation) information of any kind. Let’s assume your system gives a 1500 rating to a new player (no games played). Given 2 players with 1500 rating, one with 100 games played, the other one with 0, it’s obvious the rating of the first one is much more trustworthy. Elo doesn’t care, rating changes are controlled by a single system parameter (K) and – in the case of equal rating – it’s a zero-sum game, so winner’s rating will increase by K, while the other player’s rating will drop by K points.
  • it’s been created to measure individual ratings, so doesn’t work for teams out of the box. There’s some work done to alleviate that, but it’s tricky
  • Elo doesn’t care about activity. In a way, it rewards ‘retiring’ after achieving a very high rating, as at this point winning doesn’t give you many points, but any loss causes a big drop

In 1995, a professor from Boston University – Mark Glickman created an alternative rating system – Glicko (and later an improved version – Glicko2), that’s supposed to deal with some of these issues. Most notable differences are:

  • each player described by 3 parameters: rating, ratings deviation (RD) and volatility. Greater rating deviation results in much greater rating changes (so, typically, it should quickly converge to your ‘real’ rating)
  • ratings are recalculated per ‘period’, not after each game. A rating period duration is at the discretion of the administrator, but recommendation is to include 5-10 games. If a player didn’t play any games during certain period, his rating stays the same, but his RD will typically increase, so the longer you rest, the less reliable your rating becomes and they more rapid will be the change once you come back

Calculations are a little bit complex, but shouldn’t be a problem for modern computers. Glicko has been adapted by a number of video games like CS:GO, Heroes of the Storm or Guild Wars 2. I’ve decided to test how these 2 algorithms perform on a real world data. I knew Nate Silver and his crew have been calculating the Elo rating for NBA teams, I’ve decided to do something similar for the NHL. I’ve only included 2015-2016 regular season hero, so 82 games for each team. Used a vanilla win-tie-loss system (1/0.5/0), nothing fancy like home-ice advantage or the victory margin (home ice could make sense, margin probably not, hockey’s too volatile). Scores have been provided by The Hockey Reference. Random observations:

  • Elo took much less time to implement/tweak, it basically ‘just works’. There’s just a single parameter (K) and the usual rule of thumb is: set it to some high value for the first X game, use a lower value after that. 5/38 used 20 for their NBA ratings, my system uses 50 for the first 10 games and 20 afterwards. K is also trivial to ‘get’ intuitively, it’s simply the biggest possible rating change. Glicko2 uses a system parameter ‘tau’, recommended range 0.3-1.2, but I couldn’t really observe big impact in that range (opted for 0.5). Lower values are supposed to prevent big volatility changes. In my case volatility pretty much stayed at 0.06 (starting value) anyway (which is a little bit surprising, as NHL results are fairly ‘chaotic’, much more than NBA for sure, you don’t see teams going 73-9).
  • It took me a while to ‘convince’ Glicko that Pittsburgh should be the highest rated team, not Washington (cheating, I know, but I’m objective here, I’m actually Capitals fan). The recommended 5-10 games per period might work fine for chess or a ‘lifetime’ rating, but in a season of 82 games that seemed way too long. It’s definitely worth playing with period length, for video games it might make sense to make it even longer than 10 games (higher frequency). For my experiment I ended up with a period of 4 days, which usually resulted in 2-3 games for every team.
  • Both systems ‘agreed’ for the most part. They had exactly the same success rate in the playoffs (actually got the same matchups wrong) – 10/15 (ratings not updated during the playoffs). San Jose Sharks was the most unexpected for both of them (I think it was a little bit underappreciated, mostly because of the slow start, it never had time to bounce back properly). Going just by gut feeling, I’d probably agree with Elo a little bit more (rated SJS higher for example), but that’s a personal thing, you could easily defend either classification.
  • Glicko changes much more rapidly, even with fairly small RDs. As mentioned, Elo rating changes are capped at K (=20 for the most part), Glicko’s swings can be much greater, even after RD gets smaller (it was ~50 at the end of RS for most teams).
  • Neither system deals with the team rating, but they’ve not been designed for that. FWIW, Microsoft’s TrueSkill is based on Glicko.
  • Finally, there’s no nice way to say it, but both systems agree – Maple Leafs were the worst team of 2015/16. Congratulations, Toronto.

Python script + complete standings here.

DD2016 – video

June 12, 2016 – 4:31 pm

A video from my DD2016 talk has been uploaded and can be viewed here:

Digital Dragons 2016

May 23, 2016 – 12:14 am

A few days ago I came back from the Digital Dragons 2016 conference. I’ve been in talks with these guys in the past years, but timing was never quite right, this time I finally could make it. Luckily for me, it was the same year they invited John Romero, David Brevik and Chris Avellone. In December 1993 my classmate brought an old floppy disc to school. He claimed it contained the best game ever. Obviously, at that time, pretty much any new game was ‘best ever’, so we took it with grain of salt. I still copied it and tried it at home. It was unlike anything I’ve seen before, I just couldn’t put it down. This was Wolfenstein 3D. This is one of my most vivid gaming memories, New Year’s Eve of 1993, sneaking upstairs to keep playing, much to chagrin of my mom. I think I was “programming” a little bit at that point, but it was mostly limited to printing “Maciej rules” on the screen. If you had told me that one day I’d actually speak at the same conference as John Romero, I’d probably have fainted. Almost 25 years later and here we are :). DD2016 was fun, ppl compare it to ‘early GDC’ and while my first GDC was 2005 (already huge), I can see where they’re coming from. It’s still relatively “fresh”, not too many sponsored sessions, you don’t have to pay for every little thing, very strong indie vibe. I definitely hope it keeps growing (within reason).

It was nice to meet my heroes and my old friends, but I was primarily there to present. I think organizers plan to upload videos soon, but for the time being, you can find my slides here. The presentation title is: “Debugging Multiplayer Games. Lessons from Warframe“. It’s probably a little bit too packed for 45 minutes (I had to talk really fast) and I should have removed more slides than I did (some extra slides left), but oh well. If slides feel to cryptic, feel free to ping me.

Elixir diaries

February 22, 2016 – 3:45 am

You have probably heard about an AI algorithm defeating a human professional in a game of Go. The algorithm itself has been developed by Deep Mind, an English company that’s now owned by Google. One of the founders is Demis Hassabis. If you were playing Bullfrog games in the late 90s, the name might ringĀ  a bell, he’s one of the designers of Theme Park. Back in the day I remember reading a column in Edge magazine titled “The Elixir Diaries”. It was the diary of Elixir Studios, a game company Hassabis started after leaving Bullfrog. The DeepMind news made me think of that and wanted to refresh my memory and go through the diaries again. It turned out to be a little bit more difficult than expected (I didn’t really remember the exact name, so my Google search phrases weren’t very precise), but I eventually found them here. Pretty cool read, especially if you’re interested in the realities of making a game back in 90s/00s (the McD for breakfast story was painfully familiar…)

Archeology

January 20, 2016 – 5:53 am

Recently, we’ve been collectively complaining on Twitter about going crazy in C++ (as we do every few weeks). This reminded me of my dark period around 2002 when I was *really* excited about templates and metaprogramming. I tried finding my old code, but turned out my previous website was completely gone. Fortunately (?), good folks from The Wayback Machine had some copies. I moved some of my articles to a new server. I don’t think they’re of much use nowadays, but thought they could be interesting if only as a historical curiosity. Without further ado:

Fantasy football and statistics

November 22, 2015 – 7:04 pm

I’m playing in a fantasy football league with some coworkers this year. I have always liked all kinds of sports, but it’s a little bit challenging to follow NFL from Europe. There’s not too many stations that actually show it (except for the Super Bowl) and even if they do, the time difference make it difficult to watch. I still followed major news and managed to catch a game every now and then, but I was very far from expert. Basically, only knew big names and the most popular teams. It did get a little bit better after I moved here, but I’m still very ignorant. My only chance not to embarrass myself completely was to follow the steps of Mark Watney and ‘science the shit out of this’.

If you’re not familiar with the concept of fantasy football, Wikipedia has a fairly detailed description , but a TLDR; version is this:

  • each participant drafts his team (of real NFL players),
  • every week you’re awarded points based on performance of your players. Exact scoring rules differ between leagues, but some key numbers from our league: touchdown is worth 6 (receiving/rushing) or 4 (for a quarterback) pts, every rushing/receiving yard is 0.1 pts and so on. 15pts/week is a decent score, anything over 20 is really good and 30+ is amazing (this shifts a little bit in PPR formats).
  • every week you only compete against 1 other team. Team that gets more points (sum of points from all players) wins. In our league, 100pts is a fairly decent score, 120+ will get you a guaranteed win most weeks
  • every week you’re allowed to replace your players with players that are free (not playing for any other team). You also have to decide who plays and who sits (full team includes substitute players).

There’s plenty of draft strategies, but I feel it’s still the most “random” point of the season. You can only go by past data and projections, but it’s hard to say how it’ll translate to current seasons. I’m not going to talk about draft here. My system comes into play after few weeks after we’ve collected some samples from the new season. We want to see who to grab from the waiver wire and who to sit/play. Again, it’s a matter of personal preferences, but I’m fairly risk averse, so I was mostly interested in finding most “consistently decent” players. For example, consider these 2 stat lines:

  • player A: 7, 21, 5, 16 (mean: 12.25, std dev: 7.54)
  • player B: 11.5, 10.5, 12, 11.5 (mean: 11.375, std dev: 0.63)

Player A gets more points on average (12.25 vs 11.375), but I’d take player B. He’s way more reliable. I can plan my team better, I know (roughly) what to expect. Player A is the type that you sit, he gets 18, you let him play next week and he gets 5. My quest was to find players who’re not necessarily the most flash and bring the most points, but to find who’s the most consistent and should bring at least X pts every week. Before we start, a small disclaimer: my methodology here has almost nothing to do with actual science. Sample sizes are laughably small and fantasy points distribution definitely isn’t guaranteed to be normal. I basically tweaked various factors until I got results that made sense.

My idea was simple: grab data from some site, calculate mean/variance/std deviation for every player, reject outliers, recalculate, compute the “floor” (mean – std dev * some_weight). Our “floor” is basically telling us the minimum number of points we can reasonably expect. As mentioned, I’m fairly conservative, so I actually had an option to only reject “positive outliers” (that is, outliers that are greater than mean + std dev, we still keep samples where player underperformed). In our example, it’d keep all the samples for players B, but reject 21 points for player A. After rejection, player A’s average drops to 9.33 (new std dev is 5.86).

Spent some time trying to find the best source of stats and opted for FF Today. They update their stats quite often and format is fairly easy to parse. I couldn’t find an aggregate version, so I simply visit category stats and then traverse all the top players pages. My go-to combo for simple web scraping is Python+Beautiful Soup. ~100 lines of code later I had the first version of my script ready. My first hit was Stefon Diggs, but you could argue he wasn’t a real sleeper after Week 7 (script actually pointed him out after W6, but it was also due to super small sample size…). I got him from the waiver wire, but didn’t trust him enough to let him play (not enough data) and that turned out to be a good decision (his last two games were not that great). Week 10 brought a more serious try — I had to find a new kicker (my main kicker Matt Bryant didn’t play that week). Based on expert prediction I should have gone with Greg Zuerlein or Caleb Sturgis (these were the highest ranked kickers that were still available in my league). However, the script had the following to say:

1. Connor Barth 11.1004809472
2. Nick Folk 5.47483805032
3. Chandler Catanzaro 5.47186593476
4. Stephen Gostkowski 5.46112639465
5. Brandon McManus 5.24049168899
6. Dan Bailey 4.91128425904
7. Dustin Hopkins 4.61254139118
8. Steve Hauschka 4.60733909541
9. Blair Walsh 4.15430405875
10. Josh Lambo 4.0
12. Greg Zuerlein 3.82037546488
14. Caleb Sturgis 2.69765682193

Most of the highest ranked kickers were not available/injured, but Dustin Hopkins was still for grabs. As you can see, he’s expected to bring more points than both Zuerlein and Sturgis. Experts ranked him around 20th place this week, so not much confidence. I’m not entirely sure why tbh, I think it’s mostly no one really cares much about kickers. He’s only missed one FG this season (~92%). He plays for a mediocre team, but that factor is actually ‘encoded’ in the score above, he was playing for the same team when earning fantasy points so far. I decided to go with Hopkins and it ended way better than I could expect — he got me 17 pts this week. Now, if I want to be fair, I have to admit this was completely unexpected, based on his history his expected max score was around 12, but I’ll take it. It is a “positive outlier” I mentioned before, though so my algorithm will reject it when evaluating it in the future.

Hopefully it’s obvious, but I’d like to stress that this is just pure data analysis. Algorithm cares only about fantasy points. It has no idea about matchups, injuries and team strategy. In theory, fantasy points encode all these and ideally we’re looking for players “immune” to these factors, but some domain knowledge is recommended. For example, if you run the script with my default settings (stddevweight=1, rejectonlyposoutliers), top RBs look like:
python.exe nfl_crawler2.py –pos rb –rejectonlyposoutliers

1. Jamaal Charles — 10.66
2. Karlos Williams — 9.93
3. Mark Ingram — 9.01
4. LeSean McCoy — 8.83
5. Todd Gurley — 8.26
6. Devonta Freeman — 8.19
It can surprising to see Gurley and Freeman so low (and Miller is nowhere to be found), but it makes more sense if you remember Gurley had a very short outing in his first game (1.4pts) and Freeman’s actually fairly volatile (still brilliant) and that affects his deviation (he also had a 4.7pts game). Miller is not even in the top 10 because he wasn’t getting many touches with the old coach. If we run the same script with different options (reject all outliers, not only positives and only consider last 6 games, results are a little bit different):

python.exe nfl_crawler2.py –pos rb –lastn 6

1. Todd Gurley — 14.91
2. Devonta Freeman — 13.46
3. Chris Ivory — 11.79
4. Lamar Miller — 11.15
5. LeSean McCoy — 10.96
6. Adrian Peterson — 10.54
7. Karlos Williams — 9.93

Williams is probably the biggest surprise here, but he’s been posting great numbers so far (if not injured). His worst game was 9.7pts and while his ceiling might be lower than other guys he’s actually very consistent (dev of 4.5). His main problem is named McCoy (#5) who’s Buffalo’s RB1.

Without further ado, here’s a list for wide receivers:

python.exe nfl_crawler2.py –pos wr –rejectonlyposoutliers

1. Eric Decker — 9.71
2. Brandon Marshall — 8.08
3. DeAndre Hopkins — 7.53
4. Allen Hurns — 7.06
5. Larry Fitzgerald — 6.96
6. Jarvis Landry — 6.58
7. Julio Jones — 6.43
8. Allen Robinson — 6.30
9. Julian Edelman — 6.04
10. Demaryius Thomas — 5.43
11. Odell Beckham Jr. — 5.41
12. Keenan Allen — 5.15
13. Stefon Diggs — 5.10
14. Calvin Johnson — 5.07
15. Amari Cooper — 4.80
16. Rishard Matthews — 4.72
17. Alshon Jeffery — 4.66
18. T.Y. Hilton — 4.35
19. A.J. Green — 4.20
20. Travis Benjamin — 4.10
21. Mike Evans — 3.76
22. Antonio Brown — 3.74

Again, it can be a little bit surprising (especially Brown at 22, but he really suffered while Big Ben was away), but that’s an ultra conservative setting, it’s easy to adjust the script to match different preferences (e.g. stddevweight=0 gives just the average, doesn’t subtract deviation). It’s probably best to run the script with different settings, decide what’s important for us and cross reference the results.

Github project can be found here (requires Python 2.7 + Beautiful Soup).

Whac-A-Mole

August 4, 2015 – 4:45 am

I’ve been debugging a rare memory corruption bug recently and – as usual – it turned out to be an exercise in frustration. This time only part of that was because of the bug itself, the other was because methods I chose were not very helpful (in my defense, it’s been a while, so I was a little bit rusty).

The bug itself was fairly boring – sometimes, when leaving a certain UI screen game would crash. It was usually one of few places – either our UI drawing/updating code or some memory manager function. All traces lead to memory corruption. Fortunately for me, it was quite consistent and the UI crash was almost always happening because the exact same element was getting corrupted. To be more precise, UI code keeps an array of pointers to elements to tick/draw. One of the pointers (usually index 2 or 3) was getting modified. The object itself was still there, it’s just the pointer that was now pointing to some other address (not far from the object, too). I’ll describe the process briefly, hopefully it’ll save someone from going down the dead end route and wasting time.

I knew the array was OK shortly after initialization, so it was getting corrupted at some point before shutting down the movie. My first thought was to use VirtualAlloc+VirtualProtect to protect the region (we were not writing to the array again), but I dismissed it quickly. This would give us a completely different address range then when using our mem manager, so whoever was corrupting the memory would probably now just stomp over something else. I still tried that, but as expected – came short (it was not an easy repro, but I could usually get it within 5-10 minutes of constant open/close loop).

My next idea was to use debug registers. They are my go-to tool when trying to track small, localized corruptions like this. I quickly hacked something and to my surprise — the game eventually crashed, but my single-step exception has never been thrown! I started digging and to my dismay quickly confirmed that debug registers are pretty much useless in a multi-threaded environment. You could kinda expect that, seeing as we use thread context to control them, but making them work only from single thread seemed so bizarre I went with it anyway. It seems like I’m not that the only one that got that impression. Maybe they used to work differently in the past, but as of VS2012, if you enable them in thread A and thread B comes along writing to your protected address — it’ll go through without a single peep. I tried a half-hearted approach of sprinkling the same code over our other threads, but I had a feeling it was some thread that was being spawned later/not by us that was triggering the crash. Further tests seemed to confirm that theory.

This basically means we’re down to a good, old brute force method – debugger data breakpoints. They do use debug registers, but debugger will actually take a snapshot of all running threads and set debug registers for all of them (plus also any threads that are spawned afterwards). This is something you can maybe do yourself as well, but I was running it on a platform that didn’t have toolhelp32 functionality exposed (..and that still doesn’t solve a problem for threads that are forked later). Once again, this turned out to be a little bit more cumbersome than it should. Back in the old good days of VS2008 you could run macros from a breakpoint (ie. ‘run when hit’). I actually had a bunch of macros for cases like this (enable/disable next bp etc). At some point Microsoft had decided to remove it and only left an option to print a message (shame). I guess you can still do it if you write your own add-in, but that seemed like an overkill. Added some code detecting my case, put an ordinary breakpoint and then kept enabling my data breakpoint by hand. 20 minutes of keyboard mashing later I finally found my culprit (as expected, it wasn’t coming from a thread that’s been spawned by us (callback was ours, though)).

The whole thing took me way longer than it should have and in the end it turned out to be something fairly straightforward (recent modification to an async handler that wasn’t flushed before destroying a parent object, so it could have been released/reallocated in the meantime. We were only writing to one byte, but that was enough, obviously), but at least I learned a few things on the way (mostly not to rely on debug registers for stuff like that).

Know your assembly (part N)

June 3, 2015 – 5:48 am

An entertaining head-scratcher from today. Application has suddenly started crashing at launch, seemingly inside steam_api.dll. It’d be easy to blame external code, but as mentioned – it’s a new thing, so most likely related to our changes. To make things more interesting, it’s only 32-bit build that crashes, 64-bit seems fine. High-level code looks like (simplified):

struct tester
{
    virtual bool foo();
    virtual void bar()
    {
    }
    virtual void lol()
    {
        if(!foo())
        {
            printf("Failed\n");
            return;
        }
        bar();
    }
};

Crash occurs when trying to call bar() [the original code was actually crashing ‘inside’ bar, which debugger claimed to be inside aforementioned DLL]:

00199FB3 8B 06            mov         eax,dword ptr [esi]
00199FB5 8B 10            mov         edx,dword ptr [eax]
00199FB7 FF D2            call        edx
00199FB9 84 C0            test        al,al
00199FBB 75 10            jne         tester::lol+1Dh (199FCDh)
[...]
00199FCD 8B 06            mov         eax,dword ptr [esi]
00199FCF 8B 50 04         mov         edx,dword ptr [eax+4] ***
00199FD2 8B CE            mov         ecx,esi
00199FD4 5E               pop         esi
00199FD5 FF E2            jmp         edx

Crash line marked with stars – Access violation reading location 0x00000018, EAX=0x14 at this point. That’d suggest something’s wrong with our vtable. How can this be, though, we have just called another virtual method (foo) and it’s been fine! As you might have guessed, it’s foo itself that’s wreaking havoc. It’s been modified recently and now contains the following code:

static BOOL MyCallback(LPVOID, PENUM_PAGE_FILE_INFORMATION, LPCTSTR);
virtual bool foo()
{
    EnumPageFiles(MyCallback, NULL);
    return true;
}

EnumPageFiles is a system function, so we’ll ignore it for now. Perhaps there’s something in the MyCallback that causes problems. Let’s remove everything and try running with empty callback function. Still crashes. Remove call to EnumPageFiles altogether – works fine…

Did some more poking and discovered that the vtable itself is actually OK and never modified by our code (data breakpoint at EAX+4 before calling foo). It’s the value of ESI that changes! See the 2 places where we move [ESI] to EAX? ESI differs between these 2 points. It’s like foo doesn’t restore it properly! Let’s keep digging… Deep inside the EnumPageFiles there is a push/pop esi pair, so it should be fine, right? The problem is, ESP doesn’t match. When trying to pop, it’s 12 bytes less than it should be, so we’re popping a completely unrelated value (in the original code it happened to point to the middle of some function in Steam DLL). You can probably guess where this is going. Stack mismatches like that are usually as sign of calling convention issues. Quick inspection of the code that executes our callback confirms it’s the problem:

75C08D83 50               push        eax
75C08D84 8D 45 E0         lea         eax,[ebp-20h]
75C08D87 50               push        eax
75C08D88 FF 75 0C         push        dword ptr [ebp+0Ch]
75C08D8B FF 55 08         call        dword ptr [ebp+8]
...
// Callback body (return true):
003C9F90 B8 01 00 00 00   mov         eax,1
003C9F95 C3               ret

As you can see, caller pushes 3 arguments to the stack (12 bytes!), but there’s no code that pops them (we expect callee to clean-up). Consulting the documentation confirms our findings, callback function is supposed to follow the stdcall convention. We’re not done yet, adding

static BOOL CALLBACK MyCallback(LPVOID, PENUM_PAGE_FILE_INFORMATION, LPCTSTR)

doesn’t seem to cut it, compiler complains:
EnumPageFilesW’ : cannot convert parameter 1 from ‘BOOL (__stdcall *)(LPVOID,PENUM_PAGE_FILE_INFORMATION,LPCTSTR)’ to ‘PENUM_PAGE_FILE_CALLBACKW’

As it turns out, the PENUM_PAGE_FILE_CALLBACKW type doesn’t actually include __stdcall… Let’s try to force it nonetheless:

PENUM_PAGE_FILE_CALLBACKW cb = (PENUM_PAGE_FILE_CALLBACKW)MyCallback;
EnumPageFiles(cb, NULL);
...
// Callback code:
00D99F90 B8 01 00 00 00   mov         eax,1
00D99F95 C2 0C 00         ret         0Ch

As you can see, MyCallback now cleans everything up properly and – as expected – code runs without crashing. Not sure where does the discrepancy between system headers and documentation come from.

Now, you might wonder – why did the x64 build worked fine? Well, we’ve been lucky. Callback function has only 3 arguments and x64 calling convention will pass them all in registers (r8, rdx, rcx) so that stack stays untouched. If it had just 1 more — we’d run into trouble. (Correction: as pointed by Ofek, this would still be fine, there’s 1 calling convention on x64 anyway). Interestingly enough, debug version worked “fine” as well (or at least was hiding the problem more effectively, as it was using more registers so one of the top functions was saving/restoring ESI too).

C++ 11 final

April 30, 2015 – 5:20 am

I’ve been doing some micro-optimizations recently. One of the things I’ve been trying is eliminating virtual method calls in ‘leaf’ types. Consider the following snippet (simplified):

struct Bar
{
  virtual bool Func1() { return false; }
};
struct Foo : public Bar
{
  virtual bool Func1()
  {
    return true;
  }
  virtual void Func2()
  {
    if (Func1())
      printf("Hello");
  }
};

void DoSomething(Foo& f)
{
  f.Func2();
}

Calling DoSomething will result in 2 virtual method calls – and rightly so, there’s no way to tell if Func1/Func2 were not modified in some class that’s derived from Foo.

It can be a little bit wasteful, especially if we – a programmer – know for a fact that nothing derives from Foo. Func2 calling Func1 will always, in every single case call just that – Func1. I used a not-so-sophisticated method to work around that:

if(this->Foo::Func1())
    printf("Hello");

This works, but can be dangerous. Imagine one day some other programmer decides to derive from Foo and provide new implementation of Func1. He’s in for a nasty surprise (and I’m in for some public shaming). Traditionally, C++ offers a few ways of preventing inheritance, but they’re all fairly ugly (private constructors etc). Fortunately, C++ 11 introduced a new keyword – final – which does exactly what we want.

It also got me thinking – does it mean that compiler has additional knowledge it was lacking before. Couldn’t it use it to employ the same optimizations we’ve just tried to force? Are my changes even necessary? Sadly, as it often happens, the answer is — it depends.

AFAICT, Visual Studio doesn’t care much. Yes, it’ll prevent inheritance, but doesn’t seem like it affects code generation at all. Here’s assembly for our code fragment (with final keyword added):

// struct Foo final : public Bar

// DoSomething
000000013FE15460 48 8B 01             mov         rax,qword ptr [rcx]
000000013FE15463 48 FF 60 08          jmp         qword ptr [rax+8] ***

virtual void Func2()
00000001407426D0 48 83 EC 28          sub         rsp,28h
    if(Func1())
00000001407426D4 48 8B 01             mov         rax,qword ptr [rcx]
00000001407426D7 FF 10                call        qword ptr [rax] ***
00000001407426D9 84 C0                test        al,al
00000001407426DB 74 10                je          Foo::Func2+1Dh (01407426EDh)
      printf("Hello");
00000001407426DD 48 8D 0D 04 3B EF 00 lea         rcx,[string "Hello" (01416361E8h)]
00000001407426E4 48 83 C4 28          add         rsp,28h
00000001407426E8 E9 B3 FB 85 00       jmp         printf (0140FA22A0h)
00000001407426ED 48 83 C4 28          add         rsp,28h
00000001407426F1 C3                   ret

As you can see – still 2 vtable accesses (lines marked with stars). Let’s see if GCC/Clang does any better (Compiler Explorer to the rescue). Just look at this beauty (that’s a body of DoSomething):

    movl    $.L.str, %edi
    xorl    %eax, %eax
    jmp    printf                  # TAILCALL
.L.str:
    .asciz    "Hello"

Not only did the compiler de-virtualize both calls, it also inlined them and sprinkled with a tailcall. Impressive.

Sadly, as mentioned – we can’t rely on these optimizations being employed consistently, but at the very least – final will prevent another programmer from making a mistake he’d later regret.