How I found Microsoft Hypervisor bugs as a by-product of learning

Finding and Exploitation
Thoughts
Footnotes

This is a non-technical post on how I found two Microsoft Hypervisor-related vulnerabilities I reported this summer. Specifically, this post discusses CVE-2023-36427, memory corruption at arbitrary physical addresses from the root partition, fixed on November 14th. If you are interested in technical details of the bug, read the report on GitHub. This post focuses on the process of finding and exploiting the bug and my thoughts on this exercise.

The other vulnerability has not been disclosed yet.

Finding and Exploitation

How it started

A few months ago, I wrote hvext.js to better understand how Microsoft Hypervisor protected itself and the kernel as virtualization-based security (VBS). This tool, for example, let me see what MSRs were readable or writable from the root partition without getting intercepted by the hypervisor.

Dump of MSR accessibility showed only 81 MSRs were writable. As it was a small number, I decided to go through them one by one and see why they were writable. Most of those were related to performance, processor frequency, or thermal, which made sense to me.

The bug

However, I noticed that Hardware Feedback Interface (later referred to as HFI) related MSRs were writable. I knew they could corrupt any physical memory pages regardless of how a hypervisor configured EPTs. The gist of the issue with the writable HFI MSRs is that they allow software to specify the physical address of where an HFI structure is populated by the processor. Thus, the guest software could specify a physical address of hypervisor code and let the processor overwrite it with the HFI structure irrespective of EPT permissions.

Validation

I was initially skeptical about this exploitation on Windows, as this would be obvious oversight. Regardless, experiments showed the root partition could modify them. However, memory corruption did not happen as expected. This was because the processor appeared to populate the HFI structure only once after reset (*1). I could write the MSRs as many times as I wanted, but it was effectively no-op because the kernel always wrote the MSRs at its startup.

Exploitation with S3/S4

I started to think – “can you reset a processor so that you could trigger the memory write operation by the processor?”

Yes, you can. That is what S3 and S4 (sleep, and hibernation or “shutdown” on modern Windows) do. I started to look into this and realized that, in case of resume from S3/S4, the kernel conditionally wrote the MSRs and that the condition was controllable. By controlling the condition, I could avoid the MSR re-initialization by the kernel on resume and perform initial write through my code. At that point, writing the PoC was straightforward.

With the PoC, demo and the report, MSRC confirmed that this issue was valid. The fix was released November 14th as promised, and the report was eligible for a 2000 USD bounty award.

bounty

Yay!

Thoughts

A few thoughts on this bug discovery and exploitation:

Verify your assumption

First thing first; as mentioned before, the fact that the HFI MSRs could bypass EPT was not entirely new. I expected that this attack vector was considered and disabled even before my post, but it was not after all.

Not so many eyeballs

It appears that a fewer folks than I imagined look into attacks from the root partition.

If any security researcher had inspected the MSR bitmaps for the root partition, she would have found this issue, even if she did not know about this particular attack vector beforehand. Because only a handful of MSRs were writable, it should not have been difficult to go through them and see if anything could be exploited.

Same bug might exist elsewhere

Since the issue has to do with Intel processors than software specifics, there may be other hypervisors that do not restrict access to the MSRs allowing violation of security models.

Also, are there any access to other hardware mechanisms that directly write physical memory (eg, IOMMU and PCI devices)? What about AMD? I did not look into AMD at all or have a plan to do so. It may be a good research topic.

Security feature bypass matters

When I submitted the report to MSFT, I was unsure if they would view this as a vulnerability given that an Administrator could disable VBS with bcdedit anyway. Why bother?

That argument is flawed.

Later, I noticed attacks like this was an issue as such security policy violations were not observable in any standard way. Your system would be reported as “VBS enabled” on event logs, NT API, PCR, TCG logs, etc, yet exposed to attacks. There is a substantial difference from disabling VBS and rebooting, which is clearly observable.

Security research can yield vulnerabilities

Simply learning security features yielded two vulnerabilities in Windows core components (and 3000 USD) as by-product. If you are a kind of person who likes to do security feature research and still wishes to find vulnerabilities time to time, that is OK to focus on what interest you.

Footnotes

*1 (🔙): I may very well be missing something here. The Intel SDM does not indicate this behaviour. Let me know if you know of more details.

Found this post interesting? We offer a training course about the Intel virtualization technology. Check out the course syllabus.