文件的真实路径

on under windows
12 minute read

前言

最近浏览 Windows Terminal 提交,发现其文档中链接了一个 PowerShell 的 Issue:Windows Store applications incorrectly assumed to be console applications ,这个问题描述起来很简单,就是在 PowerShell 中打开 wt.exe, PowerShell 会一直等待 Windows Terminal 的退出,但实际上 Windows Terminal 是一个 GUI 程序,按照 Windows 的默认行为,PowerShell 在创建 Windows Terminal 进程后,就应该返回。这个问题是怎么产生的,其实很简单,wt.exe 是 Windows Terminal 的 AppExecutionAliasAppExecutionAlias 是一类特殊的重解析点,Windows 在创建进程时会根据 AppExecutionAlias 设施的信息启动对应的 Store App。那么 PowerShell 应当获得相应 Store App 的主程序的 Subsystem 才能正确的决定是否应该等待进程退出。因此这里获得其真实路径是必不可少的。

文件的真实路径

在 POSIX 中,我们可以使用 realpath 获得文件的真实路径,如果一个文件是常规文件则返回它的绝对路径,如果一个文件是符号链接,则返回它指向的目标文件的绝对路径。就这么简单。在 Windows 上我们该怎么做?

在 Windows 中,符号链接是使用 Reparse Point 机制实现的,其 ReparseTag IO_REPARSE_TAG_SYMLINK 值为 0xA000000C描述信息如下:

symbolic link: A symbolic link is a reparse point that points to another file system object. The object being pointed to is called the target. Symbolic links are transparent to users; the links appear as normal files or directories, and can be acted upon by the user or application in exactly the same manner. Symbolic links can be created using the FSCTL_SET_REPARSE_POINT request as specified in [MS-FSCC] section 2.3.61. They can be deleted using the FSCTL_DELETE_REPARSE_POINT request as specified in [MS-FSCC] section 2.3.5. Implementing symbolic links is optional for a file system.

那么我们可以通过解析重解析点获得符号链接的真实路径。但我们通常不这么做,因此 Windows 有个 API 可以帮助我们做到这件事。GetFinalPathNameByHandleW 能够帮助我们获得已打开的文件的最终路径,也就是真实路径,而我们在打开符号链接时,IoCreateFile 最终将打开符号链接的目标文件,因此,通过 GetFinalPathNameByHandleW 我们就可以获得其真实路径。话不多说代码如下:

template <class _Ty>
[[nodiscard]] _Ty _Unaligned_load(const void *_Ptr) { // load a _Ty from _Ptr
  static_assert(std::is_trivial_v<_Ty>,
                "Unaligned loads require trivial types");
  _Ty _Tmp;
  std::memcpy(&_Tmp, _Ptr, sizeof(_Tmp));
  return _Tmp;
}

[[nodiscard]] inline bool _Is_drive_prefix(const wchar_t *const _First) {
  // test if _First points to a prefix of the form X:
  // pre: _First points to at least 2 wchar_t instances
  // pre: Little endian
  auto _Value = _Unaligned_load<unsigned int>(_First);
  _Value &=
      0xFFFF'FFDFu; // transform lowercase drive letters into uppercase ones
  _Value -=
      (static_cast<unsigned int>(L':') << (sizeof(wchar_t) * CHAR_BIT)) | L'A';
  return _Value < 26;
}

inline bool
_Is_drive_prefix_with_slash_slash_question(const std::wstring_view text) {
  return text.size() > 6 && bela::StartsWith(text, LR"(\\?\)") &&
         _Is_drive_prefix(text.data() + 4);
}

std::optional<std::wstring> RealPathByHandle(HANDLE FileHandle,
                                             bela::error_code &ec) {
  std::wstring buffer;
  buffer.resize(260); // opt
  DWORD kind = VOLUME_NAME_DOS;
  for (;;) {
    const auto blen = buffer.size();
    auto len = GetFinalPathNameByHandleW(FileHandle, buffer.data(),
                                         static_cast<DWORD>(blen), kind);
    if (len == 0) {
      auto e = GetLastError();
      if (e == ERROR_PATH_NOT_FOUND && kind == VOLUME_NAME_DOS) {
        kind = VOLUME_NAME_NT;
        continue;
      }
      ec.code = e;
      ec.message = bela::resolve_system_error_message(ec.code);
      return std::nullopt;
    }
    buffer.resize(len);
    if (len < static_cast<DWORD>(blen)) {
      break;
    }
  }
  if (kind != VOLUME_NAME_DOS) {
    // result is in the NT namespace, so apply the DOS to NT namespace prefix
    constexpr std::wstring_view ntprefix = LR"(\\?\GLOBALROOT)";
    return bela::StringCat(ntprefix, buffer);
  }
  // '\\?\C:\Path'
  if (_Is_drive_prefix_with_slash_slash_question(buffer)) {
    return std::make_optional(buffer.substr(4));
  }
  if (bela::StartsWithIgnoreCase(buffer, LR"(\\?\UNC\)")) {
    std::wstring_view sv(buffer);
    return std::make_optional(bela::StringCat(LR"(\\)", sv.substr(6)));
  }
  return std::make_optional(std::move(buffer));
}

std::optional<std::wstring> RealPath(std::wstring_view src,
                                     bela::error_code &ec) {
  auto FileHandle = CreateFileW(
      src.data(), 0, FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
      nullptr, OPEN_EXISTING, FILE_FLAG_BACKUP_SEMANTICS, nullptr);
  // FILE_FLAG_BACKUP_SEMANTICS open directory require
  if (FileHandle == INVALID_HANDLE_VALUE) {
    ec = bela::make_system_error_code();
    return std::nullopt;
  }
  auto closer = bela::finally([&] { CloseHandle(FileHandle); });
  return bela::RealPathByHandle(FileHandle, ec);
}

这里需要注意 FILE_FLAG_BACKUP_SEMANTICS 用于打开目录,这个时候我们不需要读写,因此第二个参数 dwDesiredAccess 值为 0。如果使用了 FILE_FLAG_OPEN_REPARSE_POINT 标志打开符号链接,则不会打开目标文件。

但我们使用这个函数去查询 AppExecutionAlias 的真实路径时,由于没有设置 FILE_FLAG_OPEN_REPARSE_POINT 因此无法打开文件。

通过重解析点获得真实路径

我们如果去解析重解析点,然后经过特殊处理,这样也是可以获得文件的真实路径。解析重解析点主要的难题在于获得一些不透明的结构,包含 AppExecutionAliasREPARSE_DATA_BUFFER 结构如下:

typedef struct _REPARSE_DATA_BUFFER {
  ULONG ReparseTag;         // Reparse tag type
  USHORT ReparseDataLength; // Length of the reparse data
  USHORT Reserved;          // Used internally by NTFS to store remaining length

  union {
    // Structure for IO_REPARSE_TAG_SYMLINK
    // Handled by nt!IoCompleteRequest
    struct {
      USHORT SubstituteNameOffset;
      USHORT SubstituteNameLength;
      USHORT PrintNameOffset;
      USHORT PrintNameLength;
      ULONG Flags;
      WCHAR PathBuffer[1];
      // Example of distinction between substitute and print names:
      // mklink /d ldrive c:\
      // SubstituteName: c:\\??\
      // PrintName: c:\
    } SymbolicLinkReparseBuffer;

    // Structure for IO_REPARSE_TAG_MOUNT_POINT
    // Handled by nt!IoCompleteRequest
    struct {
      USHORT SubstituteNameOffset;
      USHORT SubstituteNameLength;
      USHORT PrintNameOffset;
      USHORT PrintNameLength;
      WCHAR PathBuffer[1];
    } MountPointReparseBuffer;

    // Structure for IO_REPARSE_TAG_WIM
    // Handled by wimmount!FPOpenReparseTarget->wimserv.dll
    // (wimsrv!ImageExtract)
    struct {
      GUID ImageGuid;           // GUID of the mounted VIM image
      BYTE ImagePathHash[0x14]; // Hash of the path to the file within the image
    } WimImageReparseBuffer;

    // Structure for IO_REPARSE_TAG_WOF
    // Handled by FSCTL_GET_EXTERNAL_BACKING, FSCTL_SET_EXTERNAL_BACKING in NTFS
    // (Windows 10+)
    struct {
      //-- WOF_EXTERNAL_INFO --------------------
      ULONG Wof_Version;  // Should be 1 (WOF_CURRENT_VERSION)
      ULONG Wof_Provider; // Should be 2 (WOF_PROVIDER_FILE)

      //-- FILE_PROVIDER_EXTERNAL_INFO_V1 --------------------
      ULONG FileInfo_Version; // Should be 1 (FILE_PROVIDER_CURRENT_VERSION)
      ULONG
      FileInfo_Algorithm; // Usually 0 (FILE_PROVIDER_COMPRESSION_XPRESS4K)
    } WofReparseBuffer;

    // Structure for IO_REPARSE_TAG_APPEXECLINK
    struct {
      ULONG StringCount;   // Number of the strings in the StringList, separated
                           // by '\0'
      WCHAR StringList[1]; // Multistring (strings separated by '\0', terminated
                           // by '\0\0')
    } AppExecLinkReparseBuffer;

    // Structure for IO_REPARSE_TAG_WCI (0x80000018)
    struct {
      ULONG Version; // Expected to be 1 by wcifs.sys
      ULONG Reserved;
      GUID LookupGuid;      // GUID used for lookup in wcifs!WcLookupLayer
      USHORT WciNameLength; // Length of the WCI subname, in bytes
      WCHAR WciName[1];     // The WCI subname (not zero terminated)
    } WcifsReparseBuffer;

    // Handled by cldflt.sys!HsmpRpReadBuffer
    struct {
      USHORT Flags;    // Flags (0x8000 = not compressed)
      USHORT Length;   // Length of the data (uncompressed)
      BYTE RawData[1]; // To be RtlDecompressBuffer-ed
    } HsmReparseBufferRaw;

    // Dummy structure
    struct {
      UCHAR DataBuffer[1];
    } GenericReparseBuffer;
  } DUMMYUNIONNAME;
} REPARSE_DATA_BUFFER, *PREPARSE_DATA_BUFFER;

知道这个结构,我们就能对其进行解析了:

// https://github.com/fcharlie/bela/blob/728a4b726f303e7c861823232991de7fdea4d992/src/belawin/reparsepoint.cc#L32
bool FileReparser::FileDeviceLookup(std::wstring_view file,
                                    bela::error_code &ec) {
  if (buffer = reinterpret_cast<REPARSE_DATA_BUFFER *>(
          HeapAlloc(GetProcessHeap(), 0, MAXIMUM_REPARSE_DATA_BUFFER_SIZE));
      buffer == nullptr) {
    ec = bela::make_system_error_code();
    return false;
  }
  if (FileHandle = CreateFileW(
          file.data(), 0,
          FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, nullptr,
          OPEN_EXISTING,
          FILE_FLAG_BACKUP_SEMANTICS | FILE_FLAG_OPEN_REPARSE_POINT, nullptr);
      FileHandle == INVALID_HANDLE_VALUE) {
    ec = bela::make_system_error_code();
    return false;
  }
  if (DeviceIoControl(FileHandle, FSCTL_GET_REPARSE_POINT, nullptr, 0, buffer,
                      MAXIMUM_REPARSE_DATA_BUFFER_SIZE, &len,
                      nullptr) != TRUE) {
    if (ec.code = GetLastError(); ec.code != ERROR_NOT_A_REPARSE_POINT) {
      ec.message = bela::resolve_system_error_message(ec.code);
    }
    return false;
  }
  return true;
}

inline bool DecodeAppLink(const REPARSE_DATA_BUFFER *buffer,
                          AppExecTarget &target) {
  LPWSTR szString = (LPWSTR)buffer->AppExecLinkReparseBuffer.StringList;
  std::vector<std::wstring_view> strv;
  for (ULONG i = 0; i < buffer->AppExecLinkReparseBuffer.StringCount; i++) {
    auto len = wcslen(szString);
    strv.emplace_back(szString, len);
    szString += len + 1;
  }
  if (strv.empty()) {
    return false;
  }
  target.pkid = strv[0];
  if (strv.size() > 1) {
    target.appuserid = strv[1];
  }
  if (strv.size() > 2) {
    target.target = strv[2];
  }
  return true;
}

// https://github.com/fcharlie/bela/blob/728a4b726f303e7c861823232991de7fdea4d992/src/belawin/reparsepoint.cc#L278
std::optional<std::wstring> RealPathEx(std::wstring_view src,
                                       bela::error_code &ec) {
  FileReparser reparser;
  if (!reparser.FileDeviceLookup(src, ec)) {
    if (ec.code == ERROR_NOT_A_REPARSE_POINT) {
      ec.code = 0;
      return std::make_optional(bela::PathAbsolute(src));
    }
    return std::nullopt;
  }
  switch (reparser.buffer->ReparseTag) {
  case IO_REPARSE_TAG_APPEXECLINK:
    if (AppExecTarget target; DecodeAppLink(reparser.buffer, target)) {
      return std::make_optional(std::move(target.target));
    }
    ec = bela::make_error_code(1, L"BAD: unable decode AppLinkExec");
    return std::nullopt;
  case IO_REPARSE_TAG_SYMLINK:
    CloseHandle(reparser.FileHandle);
    reparser.FileHandle = INVALID_HANDLE_VALUE;
    if (auto target = bela::RealPath(src, ec); target) {
      return std::make_optional(std::move(*target));
    }
    return std::nullopt;
  case IO_REPARSE_TAG_GLOBAL_REPARSE:
    if (std::wstring target; DecodeSymbolicLink(reparser.buffer, target)) {
      return std::make_optional(std::move(target));
    }
    ec = bela::make_error_code(1, L"BAD: unable decode Global SymbolicLink");
    return std::nullopt;
  default:
    break;
  }
  return std::make_optional<std::wstring>(src);
}

对于 AppExecutionAlias,我们可以先使用 Target 作为其真实路径,然后再去解析 PE 文件的子系统,这样就能避免 PowerShell 的那个问题发生。解析符号链接时,为了避免需要繁琐的解析目标文件,我们重新使用 bela::RealPath 获得真实路径。

最后

在 Windows 中,比如 Git for VFS,Azure,OneDrive,Container 等服务或者功能也都使用了特定的重解析点。有兴趣可以去了解一下。