网站配置了 robots.txt 来控制爬虫的行为,由于文件里面有中文注释,导致直接浏览器打开之后中文显示乱码。
对于 Apache 服务器,可以通过在 .htaccess 中配置文件的字符集格式解决显示乱码问题。
Set universal UTF-8 encoding:
1 |
AddDefaultCharset utf-8 |
Set UTF-8 encoding on select file types only:
1 |
AddCharset utf-8 .html .css .js .php |
Note: An .htaccess file located in a sub-directory overrides any duplicate rules from previous .htaccess files. For example, if you have a .htaccess file located in the root defining a 404 and 403 error page, and another .htaccess located in the “test” folder defining only a 404 error page, any files and folders in the “test” folder will use the 404 page defined in the "test" .htaccess file, and the 403 page defined in the root .htaccess file.