| --- |
| language: ar |
| thumbnail: https://raw.githubusercontent.com/mawdoo3/Multi-dialect-Arabic-BERT/master/multidialct_arabic_bert.png |
| datasets: |
| - nadi |
| --- |
| # Multi-dialect-Arabic-BERT |
| This is a repository of Multi-dialect Arabic BERT model. |
|
|
| By [Mawdoo3-AI](https://ai.mawdoo3.com/). |
|
|
| <p align="center"> |
| <br> |
| <img src="https://raw.githubusercontent.com/mawdoo3/Multi-dialect-Arabic-BERT/master/multidialct_arabic_bert.png" alt="Background reference: http://www.qfi.org/wp-content/uploads/2018/02/Qfi_Infographic_Mother-Language_Final.pdf" width="500"/> |
| <br> |
| <p> |
| |
|
|
|
|
| ### About our Multi-dialect-Arabic-BERT model |
| Instead of training the Multi-dialect Arabic BERT model from scratch, we initialized the weights of the model using [Arabic-BERT](https://github.com/alisafaya/Arabic-BERT) and trained it on 10M arabic tweets from the unlabled data of [The Nuanced Arabic Dialect Identification (NADI) shared task](https://sites.google.com/view/nadi-shared-task). |
|
|
| ### To cite this work |
|
|
| ``` |
| @misc{talafha2020multidialect, |
| title={Multi-Dialect Arabic BERT for Country-Level Dialect Identification}, |
| author={Bashar Talafha and Mohammad Ali and Muhy Eddin Za'ter and Haitham Seelawi and Ibraheem Tuffaha and Mostafa Samir and Wael Farhan and Hussein T. Al-Natsheh}, |
| year={2020}, |
| eprint={2007.05612}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL} |
| } |
| ``` |
|
|
| ### Usage |
| The model weights can be loaded using `transformers` library by HuggingFace. |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModel |
| |
| tokenizer = AutoTokenizer.from_pretrained("bashar-talafha/multi-dialect-bert-base-arabic") |
| model = AutoModel.from_pretrained("bashar-talafha/multi-dialect-bert-base-arabic") |
| ``` |
|
|
| Example using `pipeline`: |
|
|
| ```python |
| from transformers import pipeline |
| |
| fill_mask = pipeline( |
| "fill-mask", |
| model="bashar-talafha/multi-dialect-bert-base-arabic ", |
| tokenizer="bashar-talafha/multi-dialect-bert-base-arabic " |
| ) |
| |
| fill_mask(" سافر الرحالة من مطار [MASK] ") |
| ``` |
| ``` |
| [{'sequence': '[CLS] سافر الرحالة من مطار الكويت [SEP]', 'score': 0.08296813815832138, 'token': 3226}, |
| {'sequence': '[CLS] سافر الرحالة من مطار دبي [SEP]', 'score': 0.05123933032155037, 'token': 4747}, |
| {'sequence': '[CLS] سافر الرحالة من مطار مسقط [SEP]', 'score': 0.046838656067848206, 'token': 13205}, |
| {'sequence': '[CLS] سافر الرحالة من مطار القاهرة [SEP]', 'score': 0.03234650194644928, 'token': 4003}, |
| {'sequence': '[CLS] سافر الرحالة من مطار الرياض [SEP]', 'score': 0.02606341242790222, 'token': 2200}] |
| ``` |
| ### Repository |
| Please check the [original repository](https://github.com/mawdoo3/Multi-dialect-Arabic-BERT) for more information. |
|
|
|
|
|
|