Git Personal Project 开源项目分类 人工智能 大模型 企业应用 科学研究 AI写开源项目 孵化优质开源项目 数字人定制 AI工具集合-AI导航
Improve span merging, internal refactoring

* Merging multi-word tokens into one, via the doc.merge() and span.merge() methods, no longer invalidates existing Span objects. This makes it much easier to merge multiple spans, e.g. to merge all named entities, or all base noun phrases. Thanks to @andreasgrv for help on this patch.
* Lots of internal refactoring, especially around the machine learning module, thinc. The thinc API has now been improved, and the spacy._ml wrapper module is no longer necessary.
* The lemmatizer now lower-cases non-noun, noun-verb and non-adjective words.
* A new attribute, .rank, is added to Token and Lexeme objects, giving the frequency rank of the word.