MySQL:MySQL8.0JSON类型使用整理,基于用户画像的案例
- 手机
- 2025-09-05 10:27:02

摘要:MySQL,JSON类型,多值索引, 用户画像
MySQL是结构化数据存储,JSON是非结构化格式,在MySQL中使用JSON类型可以打通关系型和非关系型数据的存储之间的界限,为业务提供更好的架构选择,以下内容包括
(1)JSON 数据类型(2)JSON类型创建插入数据(3)提取JSON内字段(4)JSON类型和字符串的区别(5)JSON类型数据修改(6)JSON类型使用索引(7)JSON类型其他常用函数(8)JSON ARRAY的多值索引(9)基于JSON类型的用户画像设计(1)JSON 数据类型
JSON(JavaScript Object Notation)主要用于互联网应用服务之间的数据交换。MySQL 支持JSON 对象和JSON 数组两种类型,JSON 类型是从 MySQL 5.7 版本开始支持的功能,MySQL中使用JSON有以下好处
无须预定义字段:字段可以无限拓展,避免了ALTER ADD COLUMN的操作,使用更加灵活 处理稀疏字段:避免了稀疏字段的NULL值,避免冗余存储 支持索引:相比于字符串格式的JSON,JSON类型支持索引做特定的查询优化总体而言,JSON 类型比较适合存储一些修改较少、相对静态的数据,或者说适合存储修改较少,且容忍稀疏的聚合数据,比如存储用户的登录信息
(2)JSON类型创建插入数据 create database test default charset utf8mb4; CREATE TABLE student ( id BIGINT AUTO_INCREMENT PRIMARY KEY, info JSON DEFAULT NULL );
插入数据
mysql> INSERT student (info) VALUES ('{"sex": "F", "age": 13, "city": "beijing"}'); Query OK, 1 row affected (0.13 sec) mysql> INSERT student (info) VALUES ('{"sex": "M", "age": 14, "city": "suzhou"}'); Query OK, 1 row affected (0.10 sec) mysql> INSERT student (info) VALUES ('{"sex": "F", "age": 23, "city": "shenzhen"}'); Query OK, 1 row affected (0.20 sec)查看数据
mysql> select * from student; +----+---------------------------------------------+ | id | info | +----+---------------------------------------------+ | 1 | {"age": 13, "sex": "F", "city": "beijing"} | | 2 | {"age": 14, "sex": "M", "city": "suzhou"} | | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} | +----+---------------------------------------------+试一下插入JSON数组
mysql> INSERT student (info) VALUES ('[1,2,3,4]'); Query OK, 1 row affected (0.12 sec) mysql> INSERT student (info) VALUES ('[{"sex": "M"},{"sex":"F", "city":"nanjing"}]'); Query OK, 1 row affected (0.11 sec) mysql> select * from student; +----+-------------------------------------------------+ | id | info | +----+-------------------------------------------------+ | 1 | {"age": 13, "sex": "F", "city": "beijing"} | | 2 | {"age": 14, "sex": "M", "city": "suzhou"} | | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} | | 4 | [1, 2, 3, 4] | | 5 | [{"sex": "M"}, {"sex": "F", "city": "nanjing"}] | +----+-------------------------------------------------+ 5 rows in set (0.00 sec) 可以混合插入JSON数组和JSON对象,但是必须要符合JSON格式其中整个字符串使用单引号,键值对使用双引号,JSON中可以允许有多余空格,MySQL会自动解析,输出的使用格式为符号后带有一个空格(3)提取JSON内字段
因为支持了新的JSON类型,MySQL 配套提供了丰富的 JSON 字段处理函数,用于方便地操作 JSON 数据,最常见的就是函数 JSON_EXTRACT,它用来从 JSON 数据中提取所需要的字段内容
(1)提取JSON对象主要是JSON_UNQUOTE和JSON_EXTRACT,JSON_EXTRACT作用是去除最外侧的双引号,JSON_EXTRACT根据键提取值
mysql> SELECT -> id, -> JSON_UNQUOTE(JSON_EXTRACT(info,"$.sex")) sex, -> JSON_UNQUOTE(JSON_EXTRACT(info,"$.age")) age, -> JSON_UNQUOTE(JSON_EXTRACT(info,"$.city")) city -> FROM student; +----+------+------+----------+ | id | sex | age | city | +----+------+------+----------+ | 1 | F | 13 | beijing | | 2 | M | 14 | suzhou | | 3 | F | 23 | shenzhen | +----+------+------+----------+ 3 rows in set (0.00 sec)MySQL 还提供了 ->> 表达式,和上述 SQL 效果完全一样,->>也是去除最外面的引号,还有一种符号->,它也能得到提取结果但是不去除外面的符号,相当于->和JSON_EXTRACT对应
mysql> SELECT -> id, -> info->>"$.sex" sex, -> info->>"$.age" age, -> info->>"$.city" city -> FROM student; +----+------+------+----------+ | id | sex | age | city | +----+------+------+----------+ | 1 | F | 13 | beijing | | 2 | M | 14 | suzhou | | 3 | F | 23 | shenzhen | +----+------+------+----------+ 3 rows in set (0.00 sec)如果JSON对象中查询的键不存在,则返回为NULL
mysql> SELECT -> id, -> JSON_UNQUOTE(JSON_EXTRACT(info,"$.sex")) sex, -> JSON_UNQUOTE(JSON_EXTRACT(info,"$.height")) height -> FROM student; +----+------+--------+ | id | sex | height | +----+------+--------+ | 1 | F | NULL | | 2 | M | NULL | | 3 | F | NULL | +----+------+--------+ 3 rows in set (0.00 sec) (2)提取JSON数组先创建JSON数组类型,插入数据
mysql> CREATE TABLE student ( -> id BIGINT AUTO_INCREMENT PRIMARY KEY, -> info JSON DEFAULT NULL -> ); Query OK, 0 rows affected (0.42 sec) mysql> INSERT student (info) VALUES ('[1, 2, 3, 4]'); Query OK, 1 row affected (0.09 sec) mysql> INSERT student (info) VALUES ('[2, 3, 4]'); Query OK, 1 row affected (0.08 sec) mysql> INSERT student (info) VALUES ('[3, 1, -1]'); Query OK, 1 row affected (0.13 sec) mysql> select * from student; +----+--------------+ | id | info | +----+--------------+ | 1 | [1, 2, 3, 4] | | 2 | [2, 3, 4] | | 3 | [3, 1, -1] | +----+--------------+ 3 rows in set (0.00 sec)JOSN数组通过索引取对应的值,同样是使用JSON_EXTRACT,索引从0开始
mysql> SELECT -> JSON_EXTRACT(info, '$[0]') first -> FROM student; +----------------------------+ | first | +----------------------------+ | 1 | | 2 | | 3 | +----------------------------+同样可以采用->>符号,这种方式会去除双引号,如果JSON数组内的元素是双引号字符串,显示出来的时候也会被去除引号
mysql> SELECT -> id, -> info->>"$[0]" first, -> info->>"$[1]" second -> FROM student; +----+-------+--------+ | id | first | second | +----+-------+--------+ | 1 | 1 | 2 | | 2 | 2 | 3 | | 3 | 3 | 1 | +----+-------+--------+可以提取数组中的多个值,此时JSON_EXTRACT后面传入多个参数,提取的顺序和输出顺序一致
mysql> SELECT id, JSON_EXTRACT(info, "$[1]", "$[0]") a FROM student; +----+--------+ | id | a | +----+--------+ | 1 | [2, 1] | | 2 | [5, 2] | | 3 | [3, 1] | +----+--------+ 3 rows in set (0.01 sec)如果提取的索引位置不存在则返回空
mysql> SELECT -> id, -> info->>"$[2]" a -> FROM student; +----+------+ | id | a | +----+------+ | 1 | 3 | | 2 | 6 | | 3 | NULL | +----+------+ 3 rows in set (0.00 sec) (3)提取嵌套数组嵌套数组主要是JSON数组内包含多个JSON对象或者JSON数组包含多个JSON数组,先以包含多个JSON对象为例
mysql> truncate table student; Query OK, 0 rows affected (0.70 sec) mysql> INSERT student (info) VALUES ('[{"no":1,"ent_name":"张家港市杨舍百桥士方园艺场","score":98.2}, {"no":2,"ent_name":"昆山市朱北苗圃有限公司","score":98.2}, {"no":3,"ent_name":"苏州市吴中区临湖现代渔业发展有限公司","score":98.2}]'); Query OK, 1 row affected (0.08 sec) mysql> INSERT student (info) VALUES ('[{"no":1,"ent_name":"张家港市合力土石方挖掘服务部","score":98.1}, {"no":2,"ent_name":"中国石化销售有限公司江苏苏州养武加油站","score":97.6}, {"no":3,"ent_name":"中国石化销售有限公司江苏苏州太仓璜泾二站服务点","score":97.5}]'); Query OK, 1 row affected (0.09 sec) mysql> select * from student; +----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | id | info | +----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | [{"no": 1, "score": 98.2, "ent_name": "张家港市杨舍百桥士方园艺场"}, {"no": 2, "score": 98.2, "ent_name": "昆山市朱北苗圃有限公司"}, {"no": 3, "score": 98.2, "ent_name": "苏州市吴中区临湖现代渔业发展有限公司"}] | | 2 | [{"no": 1, "score": 98.1, "ent_name": "张家港市合力土石方挖掘服务部"}, {"no": 2, "score": 97.6, "ent_name": "中国石化销售有限公司江苏苏州养武加油站"}, {"no": 3, "score": 97.5, "ent_name": "中国石化销售有限公司江苏苏州太仓璜泾二站服务点"}] | +----+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 2 rows in set (0.00 sec)提取嵌套数组的值,首先要找到数组中对应的对象,然后根据对象的键找值
mysql> SELECT -> id, -> JSON_UNQUOTE(JSON_EXTRACT(info -> "$[0]", "$.ent_name")) first_ent_name -> FROM student; +----+--------------------------------------------+ | id | first_ent_name | +----+--------------------------------------------+ | 1 | 张家港市杨舍百桥士方园艺场 | | 2 | 张家港市合力土石方挖掘服务部 | +----+--------------------------------------------+ 2 rows in set (0.00 sec)对比一下之前的写法则只能提取第一个数组对象,可见JSON_UNQUOTE对于内侧的引号不删除,只删除外侧的引号
mysql> SELECT -> id, -> JSON_UNQUOTE(JSON_EXTRACT(info, "$[0]")) first -> FROM student; +----+------------------------------------------------------------------------------------+ | id | first | +----+------------------------------------------------------------------------------------+ | 1 | {"no": 1, "score": 98.2, "ent_name": "张家港市杨舍百桥士方园艺场"} | | 2 | {"no": 1, "score": 98.1, "ent_name": "张家港市合力土石方挖掘服务部"} | +----+------------------------------------------------------------------------------------+ 2 rows in set (0.00 sec)在看数组套数组的情况
mysql> INSERT student (info) VALUES ('[1, 2, [3, 4]]'); Query OK, 1 row affected (0.07 sec) mysql> select * from student; +----+----------------+ | id | info | +----+----------------+ | 1 | [1, 2, [3, 4]] | +----+----------------+ 1 row in set (0.00 sec)提取其中嵌套的数组,第一个表示嵌套数据的位置索引,第二个*表示嵌套数据内取所有元素
mysql> SELECT -> id, -> JSON_EXTRACT(info, "$[2][*]") a -> FROM student; +----+--------+ | id | a | +----+--------+ | 1 | [3, 4] | +----+--------+ 1 row in set (0.00 sec)如果第二个参数不是*,也可以选取嵌套数据内的指定位置的元素
SELECT id, JSON_EXTRACT(info, "$[2][1]") a FROM student; mysql> SELECT -> id, -> JSON_EXTRACT(info, "$[2][1]") a -> FROM student; +----+------+ | id | a | +----+------+ | 1 | 4 | +----+------+ 1 row in set (0.00 sec) (4)提取JSON后增加过滤 / 排序条件提取JSON后不能用新命名的字段做筛选过滤,需要调用把JSON函数或者符号再写一遍
mysql> select * from student; +----+---------------------------------------------+ | id | info | +----+---------------------------------------------+ | 1 | {"age": 13, "sex": "F", "city": "beijing"} | | 2 | {"age": 14, "sex": "M", "city": "suzhou"} | | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} | +----+---------------------------------------------+ 3 rows in set (0.00 sec)筛选sex是F,age大于14的
mysql> SELECT -> id, -> info -> FROM student WHERE info->>"$.age" > 14 and info->>"$.sex" = 'F'; +----+---------------------------------------------+ | id | info | +----+---------------------------------------------+ | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} | +----+---------------------------------------------+ 1 row in set (0.01 sec)根据age倒序排序取第一,只要city列
mysql> SELECT -> id, -> info->>"$.city" -> FROM student WHERE info->>"$.age" > 14 and info->>"$.sex" = 'F'; +----+-----------------+ | id | info->>"$.city" | +----+-----------------+ | 3 | shenzhen | +----+-----------------+ 1 row in set (0.00 sec)(4)JSON类型和字符串的区别
除了JSON类型支持索引之外,看一下脚本语言对于JSON类型取值和字符串是否有区别,使用pymysql连接测试
>>> import pymysql >>> config = {"user": "root", "password": "123456", "database": "test", "host": "127.0.0.1", "port": 3306} >>> conn = pymysql.connect(**config) >>> cursor = conn.cursor() >>> cursor.execute("select info from student where id = 1") 1 >>> res = cursor.fetchall() # (('{"age": 13, "sex": "F", "city": "beijing"}',),) >>> cursor.close() >>> conn.close() >>> json.loads(res[0][0]) {'age': 13, 'sex': 'F', 'city': 'beijing'}可见结果是一个JSON格式的字符串,可以直接解析成JSON,所以脚本语言取出的JSON类型结果和字符串没有差别,就是JSON格式的字符串,另外指定JSON格式后,MySQL会对插入的字符串做检验,如果不符合JSON格式插入报错,这也是和传统Varchar或者TEXT的区别
JSON格式相比于Varchar,TEXT支持索引JSON格式会对插入的字符串做JSON格式校验,不符合则报错JSON格式的输入输出都是字符串,如果使用Varchar或者TEXT格式人工保证字符串为JSON格式,效果是一致的(5)JSON类型数据修改
修改数据主要是JSON_SET,JSON_INSERT,JSON_REPLACE三个方法
JSON_SET:替换现有key的值,插入不存在的key的值JSON_INSERT:插入不存在的key的值,已经存在的不修改JSON_REPLACE:只替换已存在的key的值,不存在的不做插入使用的时候结合update语句
mysql> select * from student; +----+---------------------------------------------+ | id | info | +----+---------------------------------------------+ | 1 | {"age": 13, "sex": "F", "city": "beijing"} | | 2 | {"age": 14, "sex": "M", "city": "suzhou"} | | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} | +----+---------------------------------------------+ 3 rows in set (0.01 sec)JSON_SET,不存在则插入,有则替换
mysql> UPDATE student SET info = JSON_SET(info, "$.city", 'wuxi', "$.height", 123) WHERE id = 1; Query OK, 1 row affected (0.87 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> select * from student; +----+--------------------------------------------------------+ | id | info | +----+--------------------------------------------------------+ | 1 | {"age": 13, "sex": "F", "city": "wuxi", "height": 123} | | 2 | {"age": 14, "sex": "M", "city": "suzhou"} | | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} | +----+--------------------------------------------------------+JSON_INSERT,只会插入不存在的值
mysql> UPDATE student SET info = JSON_INSERT(info, "$.city", 'wuxi', "$.height", 123) WHERE id = 1; Query OK, 1 row affected (0.11 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> select * from student; +----+-----------------------------------------------------------+ | id | info | +----+-----------------------------------------------------------+ | 1 | {"age": 13, "sex": "F", "city": "beijing", "height": 123} | | 2 | {"age": 14, "sex": "M", "city": "suzhou"} | | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} | +----+-----------------------------------------------------------+ 3 rows in set (0.00 sec)JSON_REPLACE,只会替换已有值
mysql> UPDATE student SET info = JSON_REPLACE(info, "$.city", 'wuxi', "$.height", 123) WHERE id = 1; Query OK, 1 row affected (0.06 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> select * from student; +----+---------------------------------------------+ | id | info | +----+---------------------------------------------+ | 1 | {"age": 13, "sex": "F", "city": "wuxi"} | | 2 | {"age": 14, "sex": "M", "city": "suzhou"} | | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} | +----+---------------------------------------------+ 3 rows in set (0.00 sec)删除使用JSON_REMOVE,在JSON对象中指定key删除
mysql> select * from student; +----+---------------------------------------------+ | id | info | +----+---------------------------------------------+ | 1 | {"age": 13, "sex": "F", "city": "wuxi"} | | 2 | {"age": 14, "sex": "M", "city": "suzhou"} | | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} | +----+---------------------------------------------+ 3 rows in set (0.00 sec) mysql> UPDATE student SET info = JSON_REMOVE(info, "$.age") WHERE id = 1; Query OK, 1 row affected (0.45 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> select * from student; +----+---------------------------------------------+ | id | info | +----+---------------------------------------------+ | 1 | {"sex": "F", "city": "wuxi"} | | 2 | {"age": 14, "sex": "M", "city": "suzhou"} | | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} | +----+---------------------------------------------+ 3 rows in set (0.00 sec)在JSON数组中指定下标删除
mysql> select * from student; +----+----------------+ | id | info | +----+----------------+ | 1 | [1, 2, [3, 4]] | | 2 | [2, 5, 6] | | 3 | [1, 3] | +----+----------------+ 3 rows in set (0.00 sec) mysql> UPDATE student SET info = JSON_REMOVE(info, "$[1]") WHERE id=2; Query OK, 1 row affected (0.72 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> select * from student; +----+----------------+ | id | info | +----+----------------+ | 1 | [1, 2, [3, 4]] | | 2 | [2, 6] | | 3 | [1, 3] | +----+----------------+ 3 rows in set (0.00 sec)(6)JSON类型使用索引
当 JSON 数据量非常大,用户希望对 JSON 数据进行有效检索时,可以利用 MySQL 的函数索引功能对 JSON 中的某个字段进行索引,具体方式是先创建一个虚拟列,再对虚拟列创建索引 先看一下没有索引下,对JSON对象中某个key做条件检索的EXPLAIN计划,可见访问方式type是ALL全表扫面,使用的索引Key是NULL
mysql> select * from student; +----+---------------------------------------------+ | id | info | +----+---------------------------------------------+ | 1 | {"age": 13, "sex": "F", "city": "beijing"} | | 2 | {"age": 14, "sex": "M", "city": "suzhou"} | | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} | +----+---------------------------------------------+ 3 rows in set (0.00 sec) mysql> explain select * from student where info->>"$.age" = 13; +----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | student | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using where | +----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+ 1 row in set, 1 warning (0.01 sec)创建虚拟列age,并且添加普通索引,可见表新增了一列(虽然是虚拟列,但是在SELECT和脚本语言取数都能取到新增的虚拟列),重新EXPLAIN之后Type改为ref代表普通索引,使用的Key是idx_age
mysql> ALTER TABLE student ADD COLUMN age INT as (info->>"$.age"); Query OK, 0 rows affected (0.54 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> select * from student; +----+---------------------------------------------+------+ | id | info | age | +----+---------------------------------------------+------+ | 1 | {"age": 13, "sex": "F", "city": "beijing"} | 13 | | 2 | {"age": 14, "sex": "M", "city": "suzhou"} | 14 | | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} | 23 | +----+---------------------------------------------+------+ 3 rows in set (0.00 sec) mysql> ALTER TABLE student ADD INDEX idx_age(age); Query OK, 0 rows affected (0.51 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> explain select * from student where info->>"$.age" = 13; +----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+-------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+-------+ | 1 | SIMPLE | student | NULL | ref | idx_age | idx_age | 5 | const | 1 | 100.00 | NULL | +----+-------------+---------+------------+------+---------------+---------+---------+-------+------+----------+-------+ 1 row in set, 1 warning (0.00 sec)除了普通索引,同理还可以为JSON对象的其他Key值创建唯一索引等 除了再建表之后创建虚拟列增加索引,也可以在建表的时候就为JSON类型创建索引
mysql> CREATE TABLE student ( -> id BIGINT AUTO_INCREMENT PRIMARY KEY, -> info JSON DEFAULT NULL, -> age Int as (info->>"$.age"), -> index idx_age(age) -> );(7)JSON类型其他常用函数
除了上面的JSON_EXTRACT和JSON_UNQUOTE,常用函数如下
MEMBER OF:只能对JSON数组使用,返回1元素存在数组中,0元素不存在数组中 JSON_CONTAINS:可以对JSON数组和JSON对象使用,针对JSON数组检查一个元素或者多个元素是否存在,对于JSON对象检查指定路径下是否有某个值或者是否有某个路径(Key) JSON_OVERLAP:比较两个JSON数组是否至少有一个元素一致,如果是返回1,否则返回0,如果是JSON对象,判断是否是找有一对key value一致 JSON_KEYS:返回JSON对象的Key,也可以是嵌套JSON对象以上函数可以在前面加上NOT关键字就可以取反
MEMBER OF是MySQL 8.0.17新增的函数,查看元素或者子数组是否存在在JSON数组中,如果把MEMBER OF放在where条件后面则直接筛选结果是1的,可以省略写=1
mysql> select * from student; +----+----------------+ | id | info | +----+----------------+ | 1 | [1, 2, [3, 4]] | | 2 | [2, 5, 6] | | 3 | [1, 3] | +----+----------------+ 3 rows in set (0.01 sec) mysql> SELECT * FROM student WHERE 3 MEMBER OF(info); +----+--------+ | id | info | +----+--------+ | 3 | [1, 3] | +----+--------+ 1 row in set (0.00 sec) mysql> SELECT * FROM student WHERE JSON_ARRAY(3, 4) MEMBER OF(info); +----+----------------+ | id | info | +----+----------------+ | 1 | [1, 2, [3, 4]] | +----+----------------+ 1 row in set (0.00 sec)JSON_CONTAINS和MEMBER OF类似,但是JSON_CONTAINS可以作用与JSON对象,对于JSON数组两者也有区别,JSON_CONTAINS可以指定多个数组内的元素,相当于是且的关系,JSON_CONTAINS放在where后面默认筛选结果值是1的
mysql> select * from student; +----+----------------+ | id | info | +----+----------------+ | 1 | [1, 2, [3, 4]] | | 2 | [2, 5, 6] | | 3 | [1, 3] | +----+----------------+ 3 rows in set (0.00 sec) mysql> SELECT * FROM student WHERE JSON_CONTAINS(info, '[2, 6]'); +----+-----------+ | id | info | +----+-----------+ | 2 | [2, 5, 6] | +----+-----------+ 1 row in set (0.00 sec) mysql> SELECT * FROM student WHERE JSON_CONTAINS(info, '[2]'); +----+----------------+ | id | info | +----+----------------+ | 1 | [1, 2, [3, 4]] | | 2 | [2, 5, 6] | +----+----------------+ mysql> SELECT * FROM student WHERE JSON_CONTAINS(info, '[3]'); +----+----------------+ | id | info | +----+----------------+ | 1 | [1, 2, [3, 4]] | | 3 | [1, 3] | +----+----------------+当JSON_CONTAINS作用与JSON对象时,需要判断某个key-value对是否存,格式是(target,value,key),在指定value的时候,数值用单引号,字符串还要在内加一层双引号
mysql> select * from student; +----+---------------------------------------------+ | id | info | +----+---------------------------------------------+ | 1 | {"age": 13, "sex": "F", "city": "beijing"} | | 2 | {"age": 14, "sex": "M", "city": "suzhou"} | | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} | +----+---------------------------------------------+ mysql> SELECT * FROM student WHERE JSON_CONTAINS(info, '"F"', '$.sex'); +----+---------------------------------------------+ | id | info | +----+---------------------------------------------+ | 1 | {"age": 13, "sex": "F", "city": "beijing"} | | 3 | {"age": 23, "sex": "F", "city": "shenzhen"} | +----+---------------------------------------------+ mysql> SELECT * FROM student WHERE JSON_CONTAINS(info, '13', '$.age'); +----+--------------------------------------------+ | id | info | +----+--------------------------------------------+ | 1 | {"age": 13, "sex": "F", "city": "beijing"} | +----+--------------------------------------------+JSON_OVERLAP返回两个JSON数组或者JSON对象至少有一个/对元素一致则是1,房子啊where后面筛选出存在至少一个交集的数据
mysql> SELECT * FROM student WHERE JSON_OVERLAPS(info, '[1, 5]'); +----+--------------+ | id | info | +----+--------------+ | 1 | [1, 2, 5] | | 2 | [2, 5, 6] | | 3 | [1, 3, 8] | | 4 | [1, 2, 7, 8] | +----+--------------+ 4 rows in set (0.00 sec) mysql> SELECT * FROM student WHERE JSON_OVERLAPS(info, '[1, 5, 6]'); +----+--------------+ | id | info | +----+--------------+ | 1 | [1, 2, 5] | | 2 | [2, 5, 6] | | 3 | [1, 3, 8] | | 4 | [1, 2, 7, 8] | +----+--------------+ 4 rows in set (0.00 sec) mysql> SELECT * FROM student WHERE JSON_OVERLAPS(info, '[5]'); +----+-----------+ | id | info | +----+-----------+ | 1 | [1, 2, 5] | | 2 | [2, 5, 6] | +----+-----------+ 2 rows in set (0.00 sec) mysql> SELECT * FROM student WHERE JSON_OVERLAPS(info, '[9]'); Empty set (0.00 sec)(8)JSON ARRAY的多值索引
从MySQL8.0.17开始,InnoDB支持多值索引。多值索引是在存储JSON数组的列上定义的辅助索引,对于JSON ARRAY的MEMBER OF,JSON_CONTAINS,JSON_OVERLAPS可以利用多值索引进行性能优化
mysql> select * from student; +----+--------------+ | id | info | +----+--------------+ | 1 | [1, 2, 5, 9] | | 2 | [2, 5, 6, 8] | | 3 | [5, 3, 8, 9] | | 4 | [1, 2, 7, 8] | +----+--------------+ 4 rows in set (0.00 sec)先不创建多值索引,使用JSON_CONTAINS语句EXPLAIN查看执行计划
mysql> EXPLAIN SELECT * FROM student WHERE JSON_CONTAINS(info, '[5, 3]'); +----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+ | 1 | SIMPLE | student | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | Using where | +----+-------------+---------+------------+------+---------------+------+---------+------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec)Type为全表扫描,key为NULL,下一步给表增加多值索引,注意如果这个地方改为idx_info((cast((info->"$") as unsigned array))),则后续所有的函数都要是info->"$",否则走不了索引
mysql> ALTER TABLE student ADD INDEX idx_info((cast(info as unsigned array))); Query OK, 0 rows affected (0.57 sec) Records: 0 Duplicates: 0 Warnings: 0查看建立多值索引之后的EXPLAIN计划,可见现在JSON操作函数都走了索引
mysql> EXPLAIN SELECT * FROM student WHERE 5 MEMBER OF(info); +----+-------------+---------+------------+------+---------------+----------+---------+-------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------+------------+------+---------------+----------+---------+-------+------+----------+-------------+ | 1 | SIMPLE | student | NULL | ref | idx_info | idx_info | 9 | const | 1 | 100.00 | Using where | +----+-------------+---------+------------+------+---------------+----------+---------+-------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) mysql> EXPLAIN SELECT * FROM student WHERE JSON_CONTAINS(info, '[5, 3]'); +----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+ | 1 | SIMPLE | student | NULL | range | idx_info | idx_info | 9 | NULL | 4 | 100.00 | Using where | +----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) mysql> EXPLAIN SELECT * FROM student WHERE JSON_OVERLAPS(info, '[5, 3]'); +----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+ | 1 | SIMPLE | student | NULL | range | idx_info | idx_info | 9 | NULL | 4 | 100.00 | Using where | +----+-------------+---------+------------+-------+---------------+----------+---------+------+------+----------+-------------+ 1 row in set, 1 warning (0.00 sec)(9)基于JSON类型的用户画像设计
下面基于30万级别的数据,以MySQL的JSON做用户画像的存储和查询测试,画像值已经全部处理成枚举值,不加入数值型的字段,第一步梳理用户标签,定义一张标签表,记录标签值和标签号
mysql> CREATE TABLE tags_info ( -> tag_id bigint auto_increment primary key, -> tag_name varchar(255) not null, -> tag_value varchar(255) not null -> ); Query OK, 0 rows affected (0.80 sec)插入所有的标签,预览表如下
mysql> select * from tags_info order by rand() limit 10; +--------+--------------+--------------------+ | tag_id | tag_name | tag_value | +--------+--------------+--------------------+ | 24 | 渠道名称 | 自助收银 | | 38 | 类目范围 | 类目多样 | | 3 | 生日月份 | 1 | | 35 | RFM | 重要保持会员 | | 11 | 生日月份 | 9 | | 44 | 时间偏好 | 常客 | | 16 | 会员等级 | 员工卡 | | 27 | 会员状态 | 沉默 | | 25 | 渠道名称 | 闪电购 | | 43 | 时间偏好 | 周末客 | +--------+--------------+--------------------+下一步构建用户画像表,标签值替换为标签ID,将一个用户的所有标签值存储为JSON ARRAY,先构建一张用户画像结果表
mysql> CREATE TABLE user_tag ( -> user_id bigint not null primary key, -> user_tags JSON -> ); Query OK, 0 rows affected (0.41 sec)插入数据查看预览结果
mysql> select * from user_tag order by rand() limit 10; +---------+--------------------------------------+ | user_id | user_tags | +---------+--------------------------------------+ | 36978 | [14, 19, 23, 25, 32, 42] | | 28515 | [14, 19, 23, 27, 36, 42] | | 28683 | [14, 19, 28] | | 39368 | [14, 19, 24, 26, 36, 43] | | 22269 | [14, 19, 28] | | 42160 | [14, 19, 23, 26, 32, 43] | | 22321 | [14, 17, 27, 33, 40, 43, 44, 47, 49] | | 20407 | [14, 19, 20, 25, 34, 38, 43] | | 26167 | [14, 17, 23, 27, 35, 38, 43] | | 38082 | [14, 19, 25, 32, 42] | +---------+--------------------------------------+ 10 rows in set (0.00 sec)下一步对用户画像进行标签筛选指定的人群做营销,实际上转化为SQL就是多个条件的与或非组合,看几个案例 (1)取用户等级是白银卡(17)或者银卡(20),且用户状态是活跃的(28),逻辑是(17∪20)∩28,33万里面筛选出265个人
mysql> SELECT * FROM user_tag WHERE JSON_OVERLAPS(user_tags, '[18, 17, 20]') AND JSON_CONTAINS(user_tags, '[28]') limit 5; +---------+------------------+ | user_id | user_tags | +---------+------------------+ | 3201 | [14, 19, 20, 28] | | 4183 | [14, 19, 20, 28] | | 4554 | [14, 19, 20, 28] | | 4890 | [14, 19, 20, 28] | | 6334 | [14, 17, 28] | +---------+------------------+ mysql> SELECT count(*) FROM user_tag WHERE JSON_OVERLAPS(user_tags, '[18, 17, 20]') AND JSON_CONTAINS(user_tags, '[28]'); +----------+ | count(*) | +----------+ | 265 | +----------+ 1 row in set (0.66 sec)(2)筛选母婴客群(47)且是周末客(43),但是过滤掉流失会员(29),逻辑是47∩43∩(not 29),最终结果筛选出11678人
mysql> SELECT * FROM user_tag WHERE JSON_CONTAINS(user_tags, '[43, 47]') AND NOT JSON_CONTAINS(user_tags, '[29]') limit 10; +---------+----------------------------------------------+ | user_id | user_tags | +---------+----------------------------------------------+ | 3036 | [14, 17, 24, 27, 33, 37, 40, 43, 44, 47, 48] | | 3049 | [14, 17, 27, 33, 38, 40, 43, 44, 47, 48] | | 3072 | [14, 19, 20, 27, 33, 37, 40, 43, 44, 47, 48] | | 3099 | [14, 19, 21, 27, 33, 37, 40, 43, 44, 47] | | 3110 | [14, 15, 27, 33, 37, 40, 43, 44, 46, 47, 48] | | 3127 | [14, 19, 20, 27, 33, 37, 40, 43, 44, 46, 47] | | 3132 | [14, 19, 27, 33, 40, 41, 43, 44, 47] | | 3147 | [14, 17, 27, 34, 40, 43, 46, 47, 48] | | 3157 | [14, 19, 23, 26, 34, 43, 47, 48] | | 3160 | [14, 19, 23, 27, 34, 40, 43, 44, 47, 48] | +---------+----------------------------------------------+ 10 rows in set (0.00 sec) mysql> SELECT count(*) FROM user_tag WHERE JSON_CONTAINS(user_tags, '[43, 47]') AND NOT JSON_CONTAINS(user_tags, '[29]'); +----------+ | count(*) | +----------+ | 11678 | +----------+ 1 row in set (0.55 sec)(3)筛选RFM为重要开头的高价值客户(34,35,36,37),且是微信会员(15),逻辑是(34∪35∪36∪37)∩15,最终筛选出148人
mysql> SELECT * FROM user_tag WHERE JSON_OVERLAPS(user_tags, '[34, 35, 36, 37]') AND JSON_CONTAINS(user_tags, '[15]') limit 10; +---------+--------------------------------------------------------+ | user_id | user_tags | +---------+--------------------------------------------------------+ | 3110 | [14, 15, 27, 33, 37, 40, 43, 44, 46, 47, 48] | | 6990 | [14, 15, 23, 27, 34, 37, 40, 43, 44, 47, 48, 49] | | 11214 | [14, 15, 24, 27, 33, 37, 40, 43, 44, 47] | | 13447 | [0, 2, 14, 15, 27, 33, 37, 40, 43, 44, 46, 47, 48] | | 18356 | [14, 15, 25, 34, 38, 43] | | 22016 | [1, 2, 14, 15, 27, 33, 37, 40, 43, 44, 46, 47, 48, 49] | | 22392 | [14, 15, 23, 27, 33, 37, 40, 41, 43, 44, 48, 49] | | 22721 | [1, 7, 14, 15, 27, 33, 37, 40, 42, 44, 48] | | 22800 | [14, 15, 23, 27, 33, 37, 40, 43, 44, 48] | | 25122 | [14, 15, 26, 34, 37, 40, 43, 47, 48, 49] | +---------+--------------------------------------------------------+ 10 rows in set (0.06 sec) mysql> SELECT count(*) FROM user_tag WHERE JSON_OVERLAPS(user_tags, '[34, 35, 36, 37]') AND JSON_CONTAINS(user_tags, '[15]'); +----------+ | count(*) | +----------+ | 148 | +----------+ 1 row in set (0.75 sec)给这张用户画像表增加一下多值索引
ALTER TABLE user_tag ADD INDEX idx_info((cast(user_tags as unsigned array)));结果是SQL不能得到正确结果,之前能检索到人群现在检索结果为符合条件的人为0,可能是在多值索引的情况下,不能组合多个JSON函数的原因
mysql> SELECT count(*) FROM user_tag WHERE JSON_OVERLAPS(user_tags, '[34, 35, 36, 37]') AND JSON_CONTAINS(user_tags, '[15]'); +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec)另外在多次测试下,就算不使用JSON函数组合,单个使用JSON_CONTAINS,JSON_OVERLAPS也可能会导致索引失效,具体原因不明,如果使用NOT条件多值索引直接失效
最后编辑于:2025-02-05 20:55:48 © 著作权归作者所有,转载或内容合作请联系作者喜欢的朋友记得点赞、收藏、关注哦!!!
MySQL:MySQL8.0JSON类型使用整理,基于用户画像的案例由讯客互联手机栏目发布,感谢您对讯客互联的认可,以及对我们原创作品以及文章的青睐,非常欢迎各位朋友分享到个人网站或者朋友圈,但转载请说明文章出处“MySQL:MySQL8.0JSON类型使用整理,基于用户画像的案例”