| --- |
| license: llama2 |
| --- |
| <!-- markdownlint-disable first-line-h1 --> |
| <!-- markdownlint-disable html --> |
|
|
| <div align="center"> |
| <h1> |
| SlimPLM |
| </h1> |
| </div> |
|
|
| <p align="center"> |
| 📝 <a href="https://arxiv.org/abs/2402.12052" target="_blank">Paper</a> • 🤗 <a href="https://huggingface.co/zstanjj/SlimPLM-Query-Rewriting/" target="_blank">Hugging Face</a> • 🧩 <a href="https://github.com/plageon/SlimPLM" target="_blank">Github</a> |
| </p> |
|
|
| <div align="center"> |
| </div> |
|
|
| 🌹 If you use this model, please star our **[GitHub repository](https://github.com/plageon/SlimPlm)** to support us. Your star means a lot! |
|
|
| ## ✨ Latest News |
|
|
| - [1/25/2024]: Retrieval Necessity Judgment Model released in [Hugging Face](https://huggingface.co/zstanjj/SlimPLM-Retrieval-Necessity-Judgment/). |
| - [2/20/2024]: Query Rewriting Model released in [Hugging Face](https://huggingface.co/zstanjj/SlimPLM-Query-Rewriting/). |
| - [5/19/2024]: Our new work, **[Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs](https://aclanthology.org/2024.acl-long.242/)**, has been accepted by **ACL 2024 main** conference. |
|
|
| ## 🎬 Get Started |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| import torch |
| |
| # construct prompt |
| question = "Who voices Darth Vader in Star Wars Episodes III-VI, IX Rogue One, and Rebels?" |
| heuristic_answer = "The voice of Darth Vader in Star Wars is provided by British actor James Earl Jones. He first voiced the character in the 1977 film \"Star Wars: Episode IV - A New Hope\", and his performance has been used in all subsequent Star Wars films, including the prequels and sequels." |
| prompt = (f"<s>[INST] <<SYS>>\nYou are a helpful assistant. Your task is to parse user input into" |
| f" structured formats according to the coarse answer. Current datatime is 2023-12-20 9:47:28" |
| f" <</SYS>>\n Course answer: (({heuristic_answer}))\nQuestion: (({question})) [/INST]") |
| |
| # alternatively you can input question only |
| # prompt = (f"<s>[INST] <<SYS>>\nYou are a helpful assistant. Your task is to parse user input into" |
| # f" structured formats. Current datatime is 2023-12-20 9:47:28" |
| # f" <</SYS>>\n{question} [/INST]") |
| |
| params_query_rewrite = {"repetition_penalty": 1.05, "temperature": 0.01, "top_k": 1, "top_p": 0.85, |
| "max_new_tokens": 512, "do_sample": False, "seed": 2023} |
| |
| # deploy model |
| model = AutoModelForCausalLM.from_pretrained("zstanjj/SlimPLM-Query-Rewriting").eval() |
| if torch.cuda.is_available(): |
| model.cuda() |
| tokenizer = AutoTokenizer.from_pretrained("zstanjj/SlimPLM-Query-Rewriting") |
| |
| # run inference |
| input_ids = tokenizer.encode(prompt.format(question=question, answer=heuristic_answer), return_tensors="pt") |
| len_input_ids = len(input_ids[0]) |
| if torch.cuda.is_available(): |
| input_ids = input_ids.cuda() |
| outputs = model.generate(input_ids) |
| res = tokenizer.decode(outputs[0][len_input_ids:], skip_special_tokens=True) |
| print(res) |
| ``` |
|
|
| ## ✏️ Citation |
|
|
| ``` |
| @inproceedings{Tan2024SmallMB, |
| title={Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs}, |
| author={Jiejun Tan and Zhicheng Dou and Yutao Zhu and Peidong Guo and Kun Fang and Ji-Rong Wen}, |
| year={2024}, |
| url={https://arxiv.org/abs/2402.12052} |
| } |
| ``` |