- Folder 0: Tokenizers
- SKMT tokenizer along with dictionaries for word roots
- PureBPE tokenization files
- Both tokenizers are available on GitHub: https://github.com/daviddrzik/Slovak_subword_tokenizers

- Folder 1: Tokenization of a random sample of text by three tokenizers + results
- Statistics of analyzed results (file: statistics.xlsx)
- All results comparing individual texts processed by each tokenizer (file: link to output file.txt)
- Source code for the tokenization of text (file: tokenization of text.py)

- Folder 2: Pre-training and fine-tuning models
- Pre-trained models (link inside to Figshare)
- Sentiment datasets from SlovakBERT
- STS dataset (link to official data source inside)
- Other source codes for pre-training and fine-tuning models, etc.