posts

Jun 14, 2020

🧯记一次环境变量导致的中文乱码

Estimated Reading Time: 2 minutes (425 words)

一、现象

  1. 使用本地ssh访问服务器
  2. 重启应用
  3. 查询主数据商品信息的内容中文乱码
  4. jinfo命令输出file.encoding = ANSI_X3.4-1968
  5. 使用Xshell访问服务器
  6. 重启应用
  7. jinfo命令输出file.encoding = UTF-8
  8. 查询主数据商品信息的内容中文恢复正常

二、为什么乱码

猜想

初步猜想是本地sshXshellssh会话locale会话不同,导致重启后的JVM默认字符集的不同导致了乱码,使用下面的代码片段来验证:

import java.nio.charset.Charset;

public class Main {
    public static void main(String[] args) {
        System.out.println(System.getProperty("file.encoding"));
        System.out.println(Charset.defaultCharset());
    }
}

思路:分别使用本地sshXshell访问服务器,查看locale并执行代码片段

使用本地ssh访问:

[服务器 ~]$ locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory # 异常信息
locale: Cannot set LC_ALL to default locale: No such file or directory   # 异常信息
LANG=en_US.UTF-8
LC_CTYPE=UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
[服务器 ~]$ java Main
ANSI_X3.4-1968
US-ASCII

可以看到,此时JVM属性file.encoding的值为ANSI_X3.4-1968,默认字符集为US-ASCII

使用Xshell访问:

[服务器 ~]$ locale
# 没有异常信息
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
[服务器 ~]$ java Main
UTF-8
UTF-8

可以看到,此时JVM属性file.encoding的值为UTF-8,默认字符集为UTF-8

到代码中验证猜想

乱码的接口核心代码如下:

try {
    result = httpClient.execute(request);
} catch (Exception e) {
    throw new RuntimeException("查询异常",e);
}

httpclient执行HTTP请求,execute方法代码:

if("POST".equals(reqType)){
    responseStr = new String(this.doHttpPost(reqParams, serverUrl)); // 注意点1:查询主数据商品信息,代码会执行此行
}else if ("GET".equals(reqType)){
    responseStr = new String(this.doHttpGet(reqParams,serverUrl));
}

注意点1这行代码,通过调用doHttpPost方法:

URIBuilder uriBuilder = new URIBuilder(serverUrl);
HttpPost httpPost = new HttpPost(uriBuilder.build());
httpPost.setHeader("Accept", "application/json;charset=utf-8"); // 注意点2:头部Accept指定了接受的响应字符集为utf-8,但HTTP协议并未强制服务端要根据Accept的字符集来响应,所以需要其他方法来判断响应的字符集
httpPost.setHeader("Content-Type", "application/json;charset=utf-8");
this.addHeader(httpPost);
StringEntity postingString = new StringEntity(JSONObject.toJSONString(params), "utf-8");
httpPost.setEntity(postingString);
return getEntityAndRelease(httpPost);

接下来在开发环境,通过改变头部Acceptcharset,添加JVM参数-Dorg.apache.commons.logging.Log=org.apache.commons.logging.impl.SimpleLog -Dorg.apache.commons.logging.simplelog.log.org.apache.http=DEBUG,观察HTTP响应的头部Content-Type的方式来判断响应的编码

[DEBUG] wire - http-outgoing-0 >> "POST /winshare-center-gateway/api/item/v1/item/ec/4435922 HTTP/1.1[\r][\n]"
[DEBUG] wire - http-outgoing-0 >> "Accept: application/json;charset=utf-8[\r][\n]"
[DEBUG] wire - http-outgoing-0 >> "Content-Type: application/json;charset=utf-8[\r][\n]"
[DEBUG] wire - http-outgoing-0 >> "Content-Length: 14[\r][\n]"
[DEBUG] wire - http-outgoing-0 >> "Host: 10.100.9.202:8607[\r][\n]"
[DEBUG] wire - http-outgoing-0 >> "Connection: Keep-Alive[\r][\n]"
[DEBUG] wire - http-outgoing-0 >> "Accept-Encoding: gzip,deflate[\r][\n]"
[DEBUG] wire - http-outgoing-0 >> "[\r][\n]"
[DEBUG] wire - http-outgoing-0 >> "{"id":4435922}"
[DEBUG] wire - http-outgoing-0 << "HTTP/1.1 503 Service Unavailable[\r][\n]"
[DEBUG] wire - http-outgoing-0 << "Content-Type: application/json;charset=UTF-8[\r][\n]"
[DEBUG] wire - http-outgoing-0 << "Content-Length: 186[\r][\n]"
[DEBUG] wire - http-outgoing-0 << "[\r][\n]"
[DEBUG] wire - http-outgoing-0 >> "POST /winshare-center-gateway/api/item/v1/item/ec/4435922 HTTP/1.1[\r][\n]"
[DEBUG] wire - http-outgoing-0 >> "Accept: application/json;charset=gb18030[\r][\n]"
[DEBUG] wire - http-outgoing-0 >> "Content-Type: application/json;charset=utf-8[\r][\n]"
[DEBUG] wire - http-outgoing-0 >> "Content-Length: 14[\r][\n]"
[DEBUG] wire - http-outgoing-0 >> "Connection: Keep-Alive[\r][\n]"
[DEBUG] wire - http-outgoing-0 >> "Accept-Encoding: gzip,deflate[\r][\n]"
[DEBUG] wire - http-outgoing-0 >> "[\r][\n]"
[DEBUG] wire - http-outgoing-0 >> "{"id":4435922}"
[DEBUG] wire - http-outgoing-0 << "HTTP/1.1 503 Service Unavailable[\r][\n]"
[DEBUG] wire - http-outgoing-0 << "Content-Type: application/json;charset=UTF-8[\r][\n]"
[DEBUG] wire - http-outgoing-0 << "Content-Length: 186[\r][\n]"
[DEBUG] wire - http-outgoing-0 << "[\r][\n]"

通过查看日志,我们发现响应的字符集都是UTF-8,也就是说该方法返回的字节数组应该被UTF-8解码;

回到注意点1,此处使用了构造方法java.lang.String#String(byte[])创建String实例,此构造方法使用默认字符集解码字节数组,使用本地ssh重启后,默认字符集为US-ASCII,而不是UTF-8,这是此次发生乱码的编码原因。

三、为什么本地ssh与Xshell表现不同

查资料得知,本地/etc/ssh/ssh_config含有SendEnv LANG LC_*,此选项指明了创建ssh会话时,将本地的LANGLC_*环境变量发送到服务端,使得创建的ssh会话使用本地的LANGLC_*环境变量;

但是本地的这些变量的值是:

[本地 ~]$ locale
LANG=""
LC_COLLATE="C"
LC_CTYPE="UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

还记得本地ssh访问服务器执行locale时的这两行报错吗?

locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory

说明当前的服务器配置是不支持LC_CTYPE="UTF-8"LC_ALL=的,但未引起重视,一直选择的忽略;解决办法是:注释本地的/etc/ssh/ssh_config中的SendEnv LANG LC_*

四、得到的教训

  1. ssh访问服务器时,不要使用本地的locale设置,避免本地环境影响服务器
  2. 启动应用时通过JVM指定文件编码(-Dfile.encoding=UTF-8),排除启动的ssh会话对应用的文件编码的影响
  3. 编码时不要假定任何的环境变量:解码字节数组时,指定字符集,不使用默认字符集

五、参考

1 Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content 2 String ( Java SE 11 & JDK 11 ) 3 man 5 ssh_config