ElasticSearch映射分词
- 软件开发
- 2025-09-09 12:24:02

目录
弃用Type
why
映射
查询 mapping of index
创建 index with mapping
添加 field with mapping
数据迁移
1.新建 一个 index with correct mapping
2.数据迁移 reindex data into that index
分词
POST _analyze
自定义词库
ik分词器
circuit_breaking_exception
弃用Type
ES 6.x 之前,Type 开始弃用
ES 7.x ,被弱化,仍支持
ES 8.x ,完全移除
弃用后,每个索引只包含一种文档类型
如果需要区分不同类型的文档,俩种方式:
创建不同的索引在文档中添加自定义字段来实现。 whyElasticsearch 的底层存储(Lucene)是基于索引的,而不是基于 Type 的。
在同一个索引中,不同 Type 的文档可能具有相同名称但不同类型的字段,这种字段类型冲突会导致数据不一致和查询错误。
GET /bank/_search { "query": { "match": { "address": "mill lane" } }, "_source": ["account_number","address"] }从查询语句可以看出,查询是基于index的,不会去指定type。如果有不同type的address,就会引起查询冲突。
映射
Mapping 定义 doc和field 如何被存储和被检索
Mapping(映射) 是 Elasticsearch 中用于定义文档结构和字段类型的机制。它类似于关系型数据库中的表结构(Schema),用于描述文档中包含哪些字段、字段的数据类型(如文本、数值、日期等),以及字段的其他属性(如是否分词、是否索引等)。
Mapping 是 Elasticsearch 的核心概念之一,它决定了数据如何被存储、索引和查询。
查询 mapping of index_mapping
GET /bank/_mapping { "bank" : { "mappings" : { "properties" : { "account_number" : { "type" : "long" }, "address" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "age" : { "type" : "long" }, "balance" : { "type" : "long" }, "city" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "email" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "employer" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "firstname" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "gender" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "lastname" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } }, "state" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } } } } } } text 可以添加子field ---keyword,类型是 keyword。keyword存储精确值 创建 index with mappingPut /{indexName}
Put /my_index { "mappings": { "properties": { "account_number": { "type": "long" }, "address": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "city": { "type": "keyword" } } } } 添加 field with mapping PUT /{indexName}/_mapping + mapping.properties请求体 PUT /my_index/_mapping { "properties": { "state": { "type": "keyword", "index": false } } } "index": false 该字段无法被索引,不会参与检索 默认true 数据迁移ES不支持修改已存在的mapping。若想更新已存在的mapping,就要进行数据迁移。
1.新建 一个 index with correct mapping PUT /my_bank { "mappings": { "properties": { "account_number": { "type": "long" }, "address": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "age": { "type": "integer" }, "balance": { "type": "long" }, "city": { "type": "keyword" }, "email": { "type": "keyword" }, "employer": { "type": "keyword" }, "firstname": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "gender": { "type": "keyword" }, "lastname": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "state": { "type": "keyword" } } } } 2.数据迁移 reindex data into that index POST _reindex { "source": { "index": "bank", "type": "account" }, "dest": { "index": "my_bank" } } ES 8.0 弃用type参数分词
将文本拆分为单个词项(tokens)
POST _analyze标准分词器
POST _analyze { "analyzer": "standard", "text": ["it's test data","hello world"] }Response
{ "tokens" : [ { "token" : "it's", "start_offset" : 0, "end_offset" : 4, "type" : "<ALPHANUM>", "position" : 0 }, { "token" : "test", "start_offset" : 5, "end_offset" : 9, "type" : "<ALPHANUM>", "position" : 1 }, { "token" : "data", "start_offset" : 10, "end_offset" : 14, "type" : "<ALPHANUM>", "position" : 2 }, { "token" : "hello", "start_offset" : 15, "end_offset" : 20, "type" : "<ALPHANUM>", "position" : 3 }, { "token" : "world", "start_offset" : 21, "end_offset" : 26, "type" : "<ALPHANUM>", "position" : 4 } ] } 自定义词库nginx/html目录下 创建es/term.text,添加词条
配置ik远程词库,/elasticsearch/config/analysis-ik/IKAnalyzer.cfg.xml
测试
POST _analyze { "analyzer": "ik_smart", "text": "尚硅谷项目谷粒商城" }[尚硅谷,谷粒商城]为term.text词库中的词条
Response
{ "tokens" : [ { "token" : "尚硅谷", "start_offset" : 0, "end_offset" : 3, "type" : "CN_WORD", "position" : 0 }, { "token" : "项目", "start_offset" : 3, "end_offset" : 5, "type" : "CN_WORD", "position" : 1 }, { "token" : "谷粒商城", "start_offset" : 5, "end_offset" : 9, "type" : "CN_WORD", "position" : 2 } ] }ik分词器
中文分词
github地址
github /infinilabs/analysis-ik下载地址
bin/elasticsearch-plugin install get.infini.cloud/elasticsearch/analysis-ik/7.4.2进入docker容器ES 下载 ik 插件
卸载插件
elasticsearch-plugin remove analysis-ik测试
POST _analyze { "analyzer": "ik_smart", "text": "我要成为java高手" }Response
{ "tokens" : [ { "token" : "我", "start_offset" : 0, "end_offset" : 1, "type" : "CN_CHAR", "position" : 0 }, { "token" : "要", "start_offset" : 1, "end_offset" : 2, "type" : "CN_CHAR", "position" : 1 }, { "token" : "成为", "start_offset" : 2, "end_offset" : 4, "type" : "CN_WORD", "position" : 2 }, { "token" : "java", "start_offset" : 4, "end_offset" : 8, "type" : "ENGLISH", "position" : 3 }, { "token" : "高手", "start_offset" : 8, "end_offset" : 10, "type" : "CN_WORD", "position" : 4 } ] } circuit_breaking_exception熔断器机制被触发
{ "error": { "root_cause": [ { "type": "circuit_breaking_exception", "reason": "[parent] Data too large, data for [<http_request>] would be [124604192/118.8mb], which is larger than the limit of [123273216/117.5mb], real usage: [124604192/118.8mb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=1788/1.7kb, in_flight_requests=0/0b, accounting=225547/220.2kb]", "bytes_wanted": 124604192, "bytes_limit": 123273216, "durability": "PERMANENT" } ], "type": "circuit_breaking_exception", "reason": "[parent] Data too large, data for [<http_request>] would be [124604192/118.8mb], which is larger than the limit of [123273216/117.5mb], real usage: [124604192/118.8mb], new bytes reserved: [0/0b], usages [request=0/0b, fielddata=1788/1.7kb, in_flight_requests=0/0b, accounting=225547/220.2kb]", "bytes_wanted": 124604192, "bytes_limit": 123273216, "durability": "PERMANENT" }, "status": 429 }查看ES日志
docker logs elasticsearch检查 Elasticsearch 的内存使用情况
GET /_cat/nodes?v&h=name,heap.percent,ram.percent如果 heap.percent 或 ram.percent 接近 100%,说明内存不足。
增加 Elasticsearch 堆内存
删除并重新创建容器 调整 -Xms 和 -Xmx 参数 256m
docker run --name elasticsearch -p 9200:9200 -p 9300:9300 \ > -e "discovery.type=single-node" \ > -e ES_JAVA_OPTS="-Xms64m -Xmx256m" \ > -v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml \ > -v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \ > -v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \ > -d elasticsearch:7.4.2ElasticSearch映射分词由讯客互联软件开发栏目发布,感谢您对讯客互联的认可,以及对我们原创作品以及文章的青睐,非常欢迎各位朋友分享到个人网站或者朋友圈,但转载请说明文章出处“ElasticSearch映射分词”