Chasing a hitch
30/Nov 2025
(I had a more catchy post title, but it’d spoil a surprise).
Last week we had an interesting debugging session, trying to find a source of a mysterious hitch that suddenly started plaguing one of our tools. Admittedly, the setup is maybe a bit exotic, but it has been working fine for years. We could not really pinpoint when it regressed exactly, but we knew it is probably not us, no one changes that tool anymore. It’s a 3rd party Windows application that supports Lua plugins for custom commands. Some of our commands use os.execute (on Windows this basically calls system) and all of them were now causing a 5-6 second hitch before doing any work. It didn’t really seem to matter what the actual shell command was, the freeze happened before all of that. We tried to narrow it down, but came up empty. We considered all the usual suspects - other 3rd party process, anti-virus software etc but nothing really stuck. The only thing we discovered was that running the “parent” Windows app as an admin helped. We were back to no hitch and almost instant response. It was still a bit cumbersome as a workaround, but at least gave us a good chance at easy A/B, we could now compare captures from admin (fast) and non-admin (slow) runs and check if anything stands out. To streamline it further, I created a minimal repro case that was a bit quicker to run, I basically mimicked the setup using the LÖVE framework. The whole code is:
local lastElapsed = 0
function love.draw()
love.graphics.print("Last elapsed: "..tostring(lastElapsed), 400, 300)
end
function love.keypressed(key)
if key == 'x' then
local startTime = os.time()
os.execute("dir")
lastElapsed = os.difftime(os.time(), startTime)
end
endAll it does is run the dir command if the X key is pressed and measures the timing. I confirmed it still exhibited same problem, it’d take 5+ seconds for non-admin and sub 1s for admin. I was now ready to pretend I was a bit like Bruce Dawson and put my WPA pants on.
I will spare you the painful details of my fumbling and try to get straight to the point, but the actual process took way more time than described here. Investigating hitches like that, that are caused by external processes is a bit different than your everyday profiling. We need to take a more ‘holistic’ view and try to look at the system as a whole. If you only follow the callstack from your process, the track will disappear pretty quickly, you will typically ‘discover’ what you already know, we spend all this time waiting on another process. I took a whole bunch of captures and they were a bit inconsistent, sometimes I was lucky to get the UI Delays group to show up but even when it did it didn’t have much detail. We were stuck in the “MsgCheck Delay”, which we kinda knew already. Since I knew the whole thing had something to do with child processes spawned by our process I decided to take a look at the Transient Process Tree and it turned out to be much more interesting. It might still be a bit noisy, it captures all the processes running at the moment (and it turns out Windows runs hundreds of them), but you can select the time interval you’re interested in and zoom-in. After doing this and comparing both runs, I finally started seeing the matrix.
The fast run actually is a bit more noisy and has some completely unrelated processes, but our ‘tree’ looks pretty simple. love.exe spawns cmd.exe which spawns conhost.exe, runs the dir command and it all finishes in 0.3s. The slow run starts the same… The obvious difference is duration (6 seconds), but also.. there are 2 other console processes below. It does not seem like they part of the same tree (created by svchost.exe, not us), but their timings are a bit suspicious. Both start time and duration are very close to the process we spawned directly.
Let’s do some digging in the CPU Usage (Precise) graph. I typically remove the actual graph and work with the table only. The exact preset we want is Context Switch by Process, Thread but we also need some extra columns. We can right click the column header and add a few columns if they’re not there yet. We want the following:
- New Thread Stack
- Ready Thread Stack
- Waits (us) Max
It is also important to load system symbols if we want our callstack to be useful for anything. I like to get rid of Waits (us) Sum to reduce the clutter a bit. We can now sort by Max Waits, find our root process (love.exe) and see what are we waiting for. Unsurprisingly, it is cmd.exe in both cases, but one is vastly longer than the other (again, 5+s vs 0.3s). Let’s keep drilling, find our child process (cmd) and expand it. Conveniently, it typically is just below since the max wait times are similar (at least in the slow case). Here is where things get interesting, we might have found our connection to OpenConsole.exe.
We can try digging further. We have our callstacks, so can try to see what’s the link exactly. With our data still sorted by Max Waits, we expand the callstacks, looking for clues. It is helpful to keep the parallel process graph in view as WPA will plot duration of selected entries, makes it easier to spot connections. Turns out there are actually links both from cmd.exe and conhost.exe. cmd.exe is a bit generic, just KernelBase.dll!ConsoleAllocate, but conhost is a bit more interesting:
The attemptHandoff function is not terribly well documented, but from what bits and pieces I could find, I understand it’s a way to pass control from conhost to a different terminal application,
typically Windows Terminal. While conhost is fairly opaque, Windows Terminal is open sourced, so we can try attacking the problem from the other side, find a corresponding code and debug from there.
(Update: I have been informed that conhost is actually in the same repository. attemptHandoff is here)
After some digging we find OpenConsole.exe!CConsoleHandoff::EstablishHandoff in the OpenConsole process and following breadcrumbs from there to Terminal’s Github repo, we arrive at the CTerminalHandoff::EstablishPtyHandoff. It’s not the only reference, but they all more or less confirm that it is a way from another process to communicate and pass control to the Terminal.
If we refer to our original “transient process” screenshot, you will notice Terminal has been spawned with the “-Embedding” argument. Again, it is not terribly well documented, but trawling the repo we can find that it basically ends up in ShouldRunAsComServer if executed this way (no own window and waiting for commands).
It was a good moment to take a breather and regroup. What we know:
- if running in normal mode, spawning cmd/conhost results in running WindowsTerminal/OpenConsole in an “embedded” (COM server mode). It does not create own window but rather listens for/handles commands from another process.
What we don’t know:
- why did it start happening recently?
- why it does not happen if running as administrator?
- why is it so slow?
- how do we fix it?
The first question was fairly easy, was enough to search the web - as of Windows 11 22H2, Windows Terminal is the ‘default command line experience’ . I will skip points 2 & 3 for now, but this ties quite nicely to point 4 directly. Now that we know what has changed, we can still ‘fix’ it or at least bypass by going back to the old setting. Just run Terminal, go to Settings and change the “Default terminal application” to “Windows Console Host”. Boom, back to almost instant.
Points 2 & 3 are a bit harder to explain. LLMs offer some theories about (2) but I could not find any official confirmation, so will not post it here.
I am also not sure about (3) tbh. What I do know is that it is tied somehow to os.execute specifically (or more likely the system function), but some apps are able to ‘sidestep’ it.
I was able to trigger same slowdown with a simple Hello World C++ app, no Lua involved (it needs to be an actual Windows app, console programs work fine, I assume they reuse existing console).
What is interesting though, C# apps seem ‘immune’. They display similar same behaviour, ie. the whole conhost<->Terminal dance and it is slower, but the difference is almost negligible.
One thing I did notice is the WindowsTerminal.exe!IslandWindow::_globalActivateWindow is handled a bit differently, in my original test case, it’ll eventually end up in dwm.exe.
The C# version seems to be going via CLR/mscoree.dll and is vastly faster but tbh this whole net of random symbols is tangled enough that I’m not sure if that’s the important difference.
So there you have it, we’re still left with some unanswered questions, I’m no Bruce, but the workaround is good enough for us for now. I am still considering reporting this to Microsoft but I don’t have high hopes. All my semi-recent reports are typically enthusiastically commented on by AI bots and never touched again.
Finally, I will leave you with this series of posts I found while looking for information. It was not 100% what I needed but it is an interesting dive in the console system internals:
Windows Command Line