你的位置:首页 > 数据库

[数据库]【mysql】添加对emoji的支持


1、简介

涉及无线相关的 MySQL 数据库建议都提前采用 utf8mb4 字符集,避免 emoji 表情符号带来的问题

MySQL Server >  5.5.3

2、配置+升级

当前配置

mysql> \s--------------mysql Ver 14.14 Distrib 5.1.73, for redhat-linux-gnu (i386) using readline 5.1Connection id:     3Current database:Current user:      root@localhostSSL:          Not in useCurrent pager:     stdoutUsing outfile:     ''Using delimiter:    ;Server version:     5.6.25-debug-log Source distributionProtocol version:    10Connection:       Localhost via UNIX socketServer characterset:  utf8mb4Db   characterset:  utf8mb4Client characterset:  utf8mb4Conn. characterset:  utf8mb4UNIX socket:      /data/mysql/mysql.sockUptime:         46 min 53 sec

更改数据库/表

ALTER DATABASE dbname CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;ALTER TABLE tbname CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;ALTER TABLE tbname CHANGE old_column_name new_column_name VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

my.cnf配置

[client] default-character-set = utf8mb4  [mysql] default-character-set = utf8mb4  [mysqld] character-set-client-handshake = FALSE character-set-server = utf8mb4 collation-server = utf8mb4_general_ci init_connect='SET NAMES utf8mb4'

 /usr/share/mysql/charsets/Index.
<charset name="utf8mb4"> <family>Unicode</family> <description>UTF-8 Unicode</description> <alias>utf-8</alias> <collation name="utf8_general_ci"   id="33">  <flag>primary</flag>  <flag>compiled</flag> </collation> <collation name="utf8_bin"      id="83">  <flag>binary</flag>  <flag>compiled</flag> </collation></charset>

查看字符集

mysql> show variables where variable_name like 'character_set%' or variable_name like 'coll%';+--------------------------+----------------------------------+| Variable_name      | Value              |+--------------------------+----------------------------------+| character_set_client   | utf8               || character_set_connection | utf8               || character_set_database  | utf8               || character_set_filesystem | binary              || character_set_results  | utf8               || character_set_server   | utf8               || character_set_system   | utf8               || character_sets_dir    | /u01/mysql/share/mysql/charsets/ || collation_connection   | utf8_general_ci         || collation_database    | utf8_general_ci         || collation_server     | utf8_general_ci         |+--------------------------+----------------------------------+11 rows in set (0.00 sec)	mysql> show char set;+----------+-----------------------------+---------------------+--------+| Charset | Description         | Default collation  | Maxlen |+----------+-----------------------------+---------------------+--------+| big5   | Big5 Traditional Chinese  | big5_chinese_ci   |   2 || dec8   | DEC West European      | dec8_swedish_ci   |   1 || cp850  | DOS West European      | cp850_general_ci  |   1 || hp8   | HP West European      | hp8_english_ci   |   1 || koi8r  | KOI8-R Relcom Russian    | koi8r_general_ci  |   1 || latin1  | cp1252 West European    | latin1_swedish_ci  |   1 || latin2  | ISO 8859-2 Central European | latin2_general_ci  |   1 || swe7   | 7bit Swedish        | swe7_swedish_ci   |   1 || ascii  | US ASCII          | ascii_general_ci  |   1 || ujis   | EUC-JP Japanese       | ujis_japanese_ci  |   3 || sjis   | Shift-JIS Japanese     | sjis_japanese_ci  |   2 || hebrew  | ISO 8859-8 Hebrew      | hebrew_general_ci  |   1 || tis620  | TIS620 Thai         | tis620_thai_ci   |   1 || euckr  | EUC-KR Korean        | euckr_korean_ci   |   2 || koi8u  | KOI8-U Ukrainian      | koi8u_general_ci  |   1 || gb2312  | GB2312 Simplified Chinese  | gb2312_chinese_ci  |   2 || greek  | ISO 8859-7 Greek      | greek_general_ci  |   1 || eucjpms | UJIS for Windows Japanese  | eucjpms_japanese_ci |   3 |+----------+-----------------------------+---------------------+--------+40 rows in set (0.00 sec)mysql> show COLLATION;+--------------------------+----------+-----+---------+----------+---------+| Collation        | Charset | Id | Default | Compiled | Sortlen |+--------------------------+----------+-----+---------+----------+---------+| big5_chinese_ci     | big5   |  1 | Yes   | Yes   |    1 || big5_bin         | big5   | 84 |     | Yes   |    1 || dec8_swedish_ci     | dec8   |  3 | Yes   | Yes   |    1 || dec8_bin         | dec8   | 69 |     | Yes   |    1 || cp850_general_ci     | cp850  |  4 | Yes   | Yes   |    1 || eucjpms_bin       | eucjpms | 98 |     | Yes   |    1 |+--------------------------+----------+-----+---------+----------+---------+219 rows in set (0.01 sec)

– character_set_server:默认的内部操作字符集

– character_set_client:客户端来源数据使用的字符集

– character_set_connection:连接层字符集

– character_set_results:查询结果字符集

– character_set_database:当前选中数据库的默认字符集

– character_set_system:系统元数据(字段名等)字符集

MySQL中的字符序名称遵从命名惯例

以字符序对应的字符集名称开头;以_ci(表示大小写不敏感 case ignore )、_cs(表示大小写敏感 case sensitive )或_bin(表示按编码值比较)结尾。

例如:在字符序“utf8_general_ci”下,字符“a”和“A”是等价的

MySQL中的字符集转换过程

1. MySQL Server收到请求时将请求数据从character_set_client转换为character_set_connection;

2. 进行内部操作前将请求数据从character_set_connection转换为内部操作字符集,其确定方法如下:

• 使用每个数据字段的CHARACTER SET设定值;

• 若上述值不存在,则使用对应数据表的DEFAULT CHARACTER SET设定值(MySQL扩展,非SQL标准);

• 若上述值不存在,则使用对应数据库的DEFAULT CHARACTER SET设定值;

• 若上述值不存在,则使用character_set_server设定值。

3. 将操作结果从内部操作字符集转换为character_set_results。

三、emoji表情与utf8mb4

emoji表情mysql的utf8是不支持,需要修改设置为utf8mb4

在MYSQL 5.5 之前, UTF8 编码只支持1-3个字节,只支持BMP这部分的unicode编码区, 关于BMP,http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters 这里看,基本就是0000~FFFF这一区。

从MYSQL5.5开始,可支持4个字节UTF编码utf8mb4,一个字符最多能有4字节,所以能支持更多的字符集,utf8mb4兼容utf8,且比utf8能表示更多的字符

CREATE TABLE `ios_emoji` (  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '自增ID',  `unicode` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'Unicode编码',  `utf8` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'UTF8编码',  `utf16` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'UTF16编码',  `sbunicode` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT 'SBUnicode编码',  `filename` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL COMMENT '文件名',  `filebyte` longblob COMMENT '文件内容字节',  PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT COMMENT='ios表情编码表'; 

  

参考文章

http://drupal.stackexchange.com/questions/166405/why-are-we-using-utf8mb4-general-ci-and-not-utf8mb4-unicode-ci

http://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci

http://blog.csdn.net/leshami/article/details/42024217

http://www.laruence.com/2008/01/05/12.html