# myplot.py
line number|Chinese|English
| --- | --- | --- |
10|示例数据|Sample
22|创建画布和轴|Create canvases and axes
26|绘制第一个柱状图|The first histogram
27|柱子的宽度|The width of the histogram
28|确定每个柱子的位置|The position of each histogram
33|第一个图|The first figure
35|绘制第二个柱状图，通过平移位置避免重叠|Draw a second histogram to avoid overlapping by panning positions
37|添加轴标题和图表标题|Add axis titles and chart titles
41|设置Y轴的刻度间隔为5|Set the scale interval for the Y axis to 5
43|创建FuncFormatter对象|Create FuncFormatter
47|显示x轴的标签|Displays the label of the x-axis
51|第一个图结束|End of the first diagram
53|添加图例|Add a legend
56|显示图表|Display the chart

# plot.py
line number|Chinese|English
| --- | --- | --- |
4|示例数据|Sample
10|创建一个画布，并设置一行三列的子图布局|Create a canvas and set up a row, three-column subgraph layout
13|在第一个子图位置上绘制第一个条形图|Draw the first bar chart at the first subchart location
20|在第二个子图位置上绘制第二个条形图|Draw a second bar chart on the second subchart position
27|在第三个子图位置上绘制第三个条形图|Draw a third bar chart at the third subchart location
34|调整子图间的间距|Adjust the spacing between subgraphs
37|显示图表|Display the chart

# train.py
line number|Chinese|English
| --- | --- | --- |
24|不使用GPU进行训练|Training without GPUs
25|模型与预处理数据的存放位置|the path of model and preprocessed data
26|预训练词向量路径|Pre-trained word vector paths
28|损失函数类型|The type of loss function
29|在bert的第几层后面融入词汇信息|
30|Bert的学习率| learning rate of bert
31|crf的学习率|learning rate of crf
32|crf的学习率|learning rate of crf
34|AdamW优化器的衰减率|Attenuation rate of the AdamW optimizer
38|训练多少步，查看验证集的指标|How many steps are trained, and the metrics for the validation set are viewed
39|输入的最大长度|The maximum length to be entered
40|每个汉字最多融合多少个词汇信息|The maximum number of words that can be fused with each Chinese character
41|取预训练词向量的前max_scan_num个构造字典树|Take the first max_scan_num of the pretrained word vectors to construct a dictionary tree
42|数据集存放路径|path of dataset
46|数据集名称|name of dataset
48|模型类别|Model category
50|覆盖数据处理的结果|Overwrite the results of data processing
53|是否加载预训练的词向量|Whether to load the pre-trained word vectors
55|数据集的标注方式|the way of dataset labeled
56|梯度积累的步数|steps of Gradient accumulation
57|梯度裁剪阈值|Gradient clipping threshold
58|设置随机种子|Set up a random seed
59|dataloader加载数据时使用的线程数量|The number of threads used by the dataloader to load data
68|设置整个开发环境的seed|Set up the seed for the entire development environment
84|todo 检查| todo check
151|对多卡的loss取平均|The average loss of  multi-card
153|梯度累积|Gradient accumulation
156|梯度裁剪|Gradient clipping
158|进行一定step的梯度累计之后，更新参数|After a certain step of gradient accumulation, the parameters are updated
160|更新参数|Update the parameters
162|更新学习率|Update the learning rate
164|清空梯度信息|Clear the gradient information
167|评测验证集和测试集上的指标|Measure metrics on the validation and test sets
196|计算数据集上的指标|Calculate metrics on the dataset
216|不同模型输入不同|Different models have different inputs
230|对多卡的loss取平均|he average loss of  multi-card
233|减去padding的[CLS]与[SEP]|Subtract [CLS] and [SEP] from padding
236|减去padding的[CLS]|Subtract [CLS] from padding
238|减去padding的[CLS]|Subtract [CLS] from padding
239|减去padding的[CLS]|Subtract [CLS] from padding
284|分词器|Tokenizer
286|数据处理器|Data Processors
290|初始化模型配置|Initialize the model configuration
298|初始化模型|Initialize the model
300|初始化模型的词向量|Initialize the word vector of the model
305|训练|training
307|加载数据集|Load the dataset
329|测试集上的指标|Metrics on the test set
331|加载验证集|Load the validation set
336|加载测试集|Load the test set
346|测试集上的指标|Metrics on the test set
354|设置参数|Set the parameters

# lebert.py
line number|Chinese|English
| --- | --- | --- |
44|bert layer的输出|output of bert layer
45|每个汉字对应的词向量集合|collection of word vectors corresponding to each Chinese character
46|每个汉字对应的词向量集合的attention mask|Attentin Musk of Each Chinese character vector
50|将词向量，与字符向量进行维度对齐|Dimensionally align word vectors with character vectors
56|计算每个字符向量，与其对应的所有词向量的注意力权重，然后加权求和。采用双线性映射计算注意力权重|The attention weights of each character vector and all of its corresponding word vectors are calculated and then weighted to be summed. The attention weights were calculated using bilinear mapping
61|将pad的注意力设为很小的数|Set the attention of the pad to a small number
65|加权求和，得到每个汉字对应的词向量集合的表示|the word vectors set of each Chinese character
197|位置编码模块|Position coding module
285|在第i层之后，进行融合|After layer i, start fusion


# the files within the 'processors' folder
## convert_format.py
line number|Chinese|English
| --- | --- | --- |
12|将bmes格式的文件，转换为json文件，json文件包含text和label,并且转换为BIOS的标注格式|Convert the bmes file to a json file, which contains text and label, and convert it to the BIOS annotation format
62|从数据集中获取所有label|Get all labels from the dataset
81|bmes生成json|bmes generate json
89|生成label文件|generate label
96|bmes生成json|bmes generate json
104|生成label文件|generate label
111|生成json文件|generate json
127|生成label文件|generate label

## cut.py
line number|Chinese|English
| --- | --- | --- |
26|只扫描前max_scan_num个词向量|Only the first max_scan_num word vectors are scanned
59|找到以text[idx]开头的所有单词|Find all words that start with text[idx]
66|todo 截断|cut

## my.py
line number|Chinese|English
| --- | --- | --- |
169|使用虚线|Use the dashed line
170|使用点划线|Use dash and dot lines
174|添加标题和标签|Add titles and tags
183|y轴范围从-5到40|The y-axis ranges from -5 to 40
187|添加图例|Add a legend
190|显示网格（可选）|Display Grid (optional)
192|显示图形|Display graphics
249|数据|data

## processor.py
line number|Chinese|English
| --- | --- | --- |
126|加载词向量|load word vector
128|构建字典树|Build a dictionary tree
130|找到数据集中的所有单词|find all the words in the dataset
132|初始化模型的词向量|Initialize the word vector of the model
139|加载label|load label
147|存在许多单字的，考虑是否去掉|If there are many words, consider whether to remove them
148|加载前max_scan_num个词向量, 并且返回词表|The first max_scan_num word vectors are loaded, and the vocabulary is returned
157|只扫描前max_scan_num个词向量|Only the first max_scan_num word vectors are scanned
181|是否不将单字加入字典树中|Whether to add words to the dictionary tree
182|构建字典树|Build a dictionary tree
195|找出文件中所有匹配的单词|Find out all the matching words in the file
219|找出text中所有的单词|Find all the words in text
225|存储匹配到的单词|Stores the matched words
237|构建单词和id的映射|Construct mappings of words and IDs
264|获取每个汉字，对应的单词列表|Get a list of words for each character
273|找到以text[idx]开头的所有单词|Find all words that start with text[idx]
352|在开头与结尾分别添加[CLS]与[SEP]|Add [CLS] to the beginning and end of [SEP] respectively
364|开头和结尾进行padding|Padding at the beginning and end
449|加载label|load label
467|在开头与结尾分别添加[CLS]与[SEP]|Add [CLS] to the beginning and end of [SEP] respectively
580|在开头与结尾分别添加[CLS]与[SEP]|Add [CLS] to the beginning and end of [SEP] respectively
590|对输入进行padding|padding outputs
607|读取文件，将每条记录读取为words|Read the file and read each record as words
617|读取完一条记录| reading a record

## trie_tree.py
line number|Chinese|English
| --- | --- | --- |
47|需要匹配的词|potential words
49|返回匹配的词, 如果存在多字词，则会筛去单字词|Returns matching words, and if there are multiple words, single-word words are filtered out
54|短的词总是在最前面|Short words always first

## vocab.py
line number|Chinese|English
| --- | --- | --- |
7|构建词表|Build vocabulary