| | --- |
| | inference: false |
| | license: apache-2.0 |
| | language: |
| | - de |
| | datasets: |
| | - DEplain/DEplain-APA-doc |
| | metrics: |
| | - sari |
| | - bleu |
| | - bertscore |
| | library_name: transformers |
| | pipeline_tag: text2text-generation |
| | tags: |
| | - text simplification |
| | - plain language |
| | - easy-to-read language |
| | - document simplification |
| | --- |
| | |
| | # DEplain German Text Simplification |
| |
|
| | This model belongs to the experiments done at the work of Stodden, Momen, Kallmeyer (2023). ["DEplain: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification."](https://arxiv.org/abs/2305.18939) In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada. Association for Computational Linguistics. |
| | Detailed documentation can be found on this GitHub repository [https://github.com/rstodden/DEPlain](https://github.com/rstodden/DEPlain) |
| |
|
| | We reused the codes from [https://github.com/a-rios/ats-models](https://github.com/a-rios/ats-models) to do our experiments. |
| |
|
| | ### Model Description |
| |
|
| | The model is a finetuned checkpoint of the pre-trained LongmBART model based on `mbart-large-cc25`. With a trimmed vocabulary to the most frequent 30k words in the German language. |
| |
|
| | The model was finetuned towards the task of German text simplification of documents. |
| |
|
| | The finetuning dataset included manually aligned sentences from the datasets `DEplain-APA-doc` only. |
| |
|
| | ### Model Usage |
| |
|
| | This model can't be used in the HuggingFace interface or via the .from_pretrained method currently. As it's a finetuning of a custom model (LongMBart), which hasn't been registered on HF yet. |
| | You can find this custom model codes at: [https://github.com/a-rios/ats-models](https://github.com/a-rios/ats-models) |
| | |
| | To test this model checkpoint, you need to clone the checkpoint repository as follows: |
| | |
| | ``` |
| | # Make sure you have git-lfs installed (https://git-lfs.com) |
| | git lfs install |
| | git clone https://huggingface.co/DEplain/trimmed_longmbart_docs_apa |
| | |
| | # if you want to clone without large files – just their pointers |
| | # prepend your git clone with the following env var: |
| | GIT_LFS_SKIP_SMUDGE=1 |
| | ``` |
| | |
| | Then set up the conda environment via: |
| | ``` |
| | conda env create -f environment.yaml |
| | ``` |
| | |
| | Then follow the procedure in the notebook `generation.ipynb`. |