# myplot.py line number|Chinese|English | --- | --- | --- | 10|示例数据|Sample 22|创建画布和轴|Create canvases and axes 26|绘制第一个柱状图|The first histogram 27|柱子的宽度|The width of the histogram 28|确定每个柱子的位置|The position of each histogram 33|第一个图|The first figure 35|绘制第二个柱状图,通过平移位置避免重叠|Draw a second histogram to avoid overlapping by panning positions 37|添加轴标题和图表标题|Add axis titles and chart titles 41|设置Y轴的刻度间隔为5|Set the scale interval for the Y axis to 5 43|创建FuncFormatter对象|Create FuncFormatter 47|显示x轴的标签|Displays the label of the x-axis 51|第一个图结束|End of the first diagram 53|添加图例|Add a legend 56|显示图表|Display the chart # plot.py line number|Chinese|English | --- | --- | --- | 4|示例数据|Sample 10|创建一个画布,并设置一行三列的子图布局|Create a canvas and set up a row, three-column subgraph layout 13|在第一个子图位置上绘制第一个条形图|Draw the first bar chart at the first subchart location 20|在第二个子图位置上绘制第二个条形图|Draw a second bar chart on the second subchart position 27|在第三个子图位置上绘制第三个条形图|Draw a third bar chart at the third subchart location 34|调整子图间的间距|Adjust the spacing between subgraphs 37|显示图表|Display the chart # train.py line number|Chinese|English | --- | --- | --- | 24|不使用GPU进行训练|Training without GPUs 25|模型与预处理数据的存放位置|the path of model and preprocessed data 26|预训练词向量路径|Pre-trained word vector paths 28|损失函数类型|The type of loss function 29|在bert的第几层后面融入词汇信息| 30|Bert的学习率| learning rate of bert 31|crf的学习率|learning rate of crf 32|crf的学习率|learning rate of crf 34|AdamW优化器的衰减率|Attenuation rate of the AdamW optimizer 38|训练多少步,查看验证集的指标|How many steps are trained, and the metrics for the validation set are viewed 39|输入的最大长度|The maximum length to be entered 40|每个汉字最多融合多少个词汇信息|The maximum number of words that can be fused with each Chinese character 41|取预训练词向量的前max_scan_num个构造字典树|Take the first max_scan_num of the pretrained word vectors to construct a dictionary tree 42|数据集存放路径|path of dataset 46|数据集名称|name of dataset 48|模型类别|Model category 50|覆盖数据处理的结果|Overwrite the results of data processing 53|是否加载预训练的词向量|Whether to load the pre-trained word vectors 55|数据集的标注方式|the way of dataset labeled 56|梯度积累的步数|steps of Gradient accumulation 57|梯度裁剪阈值|Gradient clipping threshold 58|设置随机种子|Set up a random seed 59|dataloader加载数据时使用的线程数量|The number of threads used by the dataloader to load data 68|设置整个开发环境的seed|Set up the seed for the entire development environment 84|todo 检查| todo check 151|对多卡的loss取平均|The average loss of multi-card 153|梯度累积|Gradient accumulation 156|梯度裁剪|Gradient clipping 158|进行一定step的梯度累计之后,更新参数|After a certain step of gradient accumulation, the parameters are updated 160|更新参数|Update the parameters 162|更新学习率|Update the learning rate 164|清空梯度信息|Clear the gradient information 167|评测验证集和测试集上的指标|Measure metrics on the validation and test sets 196|计算数据集上的指标|Calculate metrics on the dataset 216|不同模型输入不同|Different models have different inputs 230|对多卡的loss取平均|he average loss of multi-card 233|减去padding的[CLS]与[SEP]|Subtract [CLS] and [SEP] from padding 236|减去padding的[CLS]|Subtract [CLS] from padding 238|减去padding的[CLS]|Subtract [CLS] from padding 239|减去padding的[CLS]|Subtract [CLS] from padding 284|分词器|Tokenizer 286|数据处理器|Data Processors 290|初始化模型配置|Initialize the model configuration 298|初始化模型|Initialize the model 300|初始化模型的词向量|Initialize the word vector of the model 305|训练|training 307|加载数据集|Load the dataset 329|测试集上的指标|Metrics on the test set 331|加载验证集|Load the validation set 336|加载测试集|Load the test set 346|测试集上的指标|Metrics on the test set 354|设置参数|Set the parameters # lebert.py line number|Chinese|English | --- | --- | --- | 44|bert layer的输出|output of bert layer 45|每个汉字对应的词向量集合|collection of word vectors corresponding to each Chinese character 46|每个汉字对应的词向量集合的attention mask|Attentin Musk of Each Chinese character vector 50|将词向量,与字符向量进行维度对齐|Dimensionally align word vectors with character vectors 56|计算每个字符向量,与其对应的所有词向量的注意力权重,然后加权求和。采用双线性映射计算注意力权重|The attention weights of each character vector and all of its corresponding word vectors are calculated and then weighted to be summed. The attention weights were calculated using bilinear mapping 61|将pad的注意力设为很小的数|Set the attention of the pad to a small number 65|加权求和,得到每个汉字对应的词向量集合的表示|the word vectors set of each Chinese character 197|位置编码模块|Position coding module 285|在第i层之后,进行融合|After layer i, start fusion # the files within the 'processors' folder ## convert_format.py line number|Chinese|English | --- | --- | --- | 12|将bmes格式的文件,转换为json文件,json文件包含text和label,并且转换为BIOS的标注格式|Convert the bmes file to a json file, which contains text and label, and convert it to the BIOS annotation format 62|从数据集中获取所有label|Get all labels from the dataset 81|bmes生成json|bmes generate json 89|生成label文件|generate label 96|bmes生成json|bmes generate json 104|生成label文件|generate label 111|生成json文件|generate json 127|生成label文件|generate label ## cut.py line number|Chinese|English | --- | --- | --- | 26|只扫描前max_scan_num个词向量|Only the first max_scan_num word vectors are scanned 59|找到以text[idx]开头的所有单词|Find all words that start with text[idx] 66|todo 截断|cut ## my.py line number|Chinese|English | --- | --- | --- | 169|使用虚线|Use the dashed line 170|使用点划线|Use dash and dot lines 174|添加标题和标签|Add titles and tags 183|y轴范围从-5到40|The y-axis ranges from -5 to 40 187|添加图例|Add a legend 190|显示网格(可选)|Display Grid (optional) 192|显示图形|Display graphics 249|数据|data ## processor.py line number|Chinese|English | --- | --- | --- | 126|加载词向量|load word vector 128|构建字典树|Build a dictionary tree 130|找到数据集中的所有单词|find all the words in the dataset 132|初始化模型的词向量|Initialize the word vector of the model 139|加载label|load label 147|存在许多单字的,考虑是否去掉|If there are many words, consider whether to remove them 148|加载前max_scan_num个词向量, 并且返回词表|The first max_scan_num word vectors are loaded, and the vocabulary is returned 157|只扫描前max_scan_num个词向量|Only the first max_scan_num word vectors are scanned 181|是否不将单字加入字典树中|Whether to add words to the dictionary tree 182|构建字典树|Build a dictionary tree 195|找出文件中所有匹配的单词|Find out all the matching words in the file 219|找出text中所有的单词|Find all the words in text 225|存储匹配到的单词|Stores the matched words 237|构建单词和id的映射|Construct mappings of words and IDs 264|获取每个汉字,对应的单词列表|Get a list of words for each character 273|找到以text[idx]开头的所有单词|Find all words that start with text[idx] 352|在开头与结尾分别添加[CLS]与[SEP]|Add [CLS] to the beginning and end of [SEP] respectively 364|开头和结尾进行padding|Padding at the beginning and end 449|加载label|load label 467|在开头与结尾分别添加[CLS]与[SEP]|Add [CLS] to the beginning and end of [SEP] respectively 580|在开头与结尾分别添加[CLS]与[SEP]|Add [CLS] to the beginning and end of [SEP] respectively 590|对输入进行padding|padding outputs 607|读取文件,将每条记录读取为words|Read the file and read each record as words 617|读取完一条记录| reading a record ## trie_tree.py line number|Chinese|English | --- | --- | --- | 47|需要匹配的词|potential words 49|返回匹配的词, 如果存在多字词,则会筛去单字词|Returns matching words, and if there are multiple words, single-word words are filtered out 54|短的词总是在最前面|Short words always first ## vocab.py line number|Chinese|English | --- | --- | --- | 7|构建词表|Build vocabulary