Instructions to use paperscarecrow/LFM2-24B-A2B-Abliterated with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use paperscarecrow/LFM2-24B-A2B-Abliterated with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="paperscarecrow/LFM2-24B-A2B-Abliterated", filename="ggufs/LFM2-24B-A2B-Abliterated-F16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use paperscarecrow/LFM2-24B-A2B-Abliterated with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf paperscarecrow/LFM2-24B-A2B-Abliterated:F16 # Run inference directly in the terminal: llama-cli -hf paperscarecrow/LFM2-24B-A2B-Abliterated:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf paperscarecrow/LFM2-24B-A2B-Abliterated:F16 # Run inference directly in the terminal: llama-cli -hf paperscarecrow/LFM2-24B-A2B-Abliterated:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf paperscarecrow/LFM2-24B-A2B-Abliterated:F16 # Run inference directly in the terminal: ./llama-cli -hf paperscarecrow/LFM2-24B-A2B-Abliterated:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf paperscarecrow/LFM2-24B-A2B-Abliterated:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf paperscarecrow/LFM2-24B-A2B-Abliterated:F16
Use Docker
docker model run hf.co/paperscarecrow/LFM2-24B-A2B-Abliterated:F16
- LM Studio
- Jan
- Ollama
How to use paperscarecrow/LFM2-24B-A2B-Abliterated with Ollama:
ollama run hf.co/paperscarecrow/LFM2-24B-A2B-Abliterated:F16
- Unsloth Studio new
How to use paperscarecrow/LFM2-24B-A2B-Abliterated with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for paperscarecrow/LFM2-24B-A2B-Abliterated to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for paperscarecrow/LFM2-24B-A2B-Abliterated to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for paperscarecrow/LFM2-24B-A2B-Abliterated to start chatting
- Pi new
How to use paperscarecrow/LFM2-24B-A2B-Abliterated with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf paperscarecrow/LFM2-24B-A2B-Abliterated:F16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "paperscarecrow/LFM2-24B-A2B-Abliterated:F16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use paperscarecrow/LFM2-24B-A2B-Abliterated with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf paperscarecrow/LFM2-24B-A2B-Abliterated:F16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default paperscarecrow/LFM2-24B-A2B-Abliterated:F16
Run Hermes
hermes
- Docker Model Runner
How to use paperscarecrow/LFM2-24B-A2B-Abliterated with Docker Model Runner:
docker model run hf.co/paperscarecrow/LFM2-24B-A2B-Abliterated:F16
- Lemonade
How to use paperscarecrow/LFM2-24B-A2B-Abliterated with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull paperscarecrow/LFM2-24B-A2B-Abliterated:F16
Run and chat with the model
lemonade run user.LFM2-24B-A2B-Abliterated-F16
List all available models
lemonade list
crucial note: this currently only works on llama.cpp CUDA; I have not managed to get it working on ROCM or vulkan llama.cpp.
LFM2-24B-A2B-Abliterated
This is an abliterated version of Liquid AI's LFM2-24B-A2B MoE model. It has been modified via layerwise orthogonal projection to completely remove its built-in safety filters and refusal mechanisms, allowing the continuous-time hybrid architecture to flow uninhibited.
It was created because I wasn't satisfied with other abliterations I saw for these, and decided to take a crack at it in a way that matched one of my favorite models: mlabonne's gemma3-27b-it-abliterated.
## Architectural Hurdles & Methodology
Liquid Foundation Models use a non-standard hybrid architecture. The 24B version combines 30 Short-Convolution layers and 10 Grouped-Query Attention layers, alongside a massive 64-expert Mixture-of-Experts (MoE) routing system. Standard ablation scripts designed for Llama-class transformers will completely crash on this architecture due to proprietary Lfm2MoeExperts class wrappers and complex routing mechanisms.
This model was abliterated by:
- Adapting forward hooks to safely pass Liquid's dynamic states and targeting the dead center of the network (Layer 20) during the measurement phase.
- Extracting the "refusal vector" from the hidden states of 100 harmful vs. 100 harmless instructions (utilizing
mlabonne/harmful_behaviorsandmlabonne/harmless_alpaca). - Deploying a recursive tensor-hunting script to dynamically drill through the un-iterable custom expert classes.
- Applying orthogonal projection (
W_new = W - v(v^T W)) directly to the Token Mixing matrices (o_proj,out_proj) and all 64 Expert Channel Mixing down-projections (w2,down_proj) across the network.
Credit to Maxime Labonne and Sumandora for the foundational datasets and math, adapted here for the massive LFM MoE architecture.
## Notes on Hardware & Compute Requirements
Because of the sheer size of a 24B MoE model (taking ~48GB just to load the base float16 weights), memory management during tensor surgery is critical. This ablation was executed on a 96GB NVIDIA RTX 6000 Ada using pure PyTorch CUDA acceleration, heavily utilizing garbage collection to keep the projection math from spiking into Out-of-Memory errors.
If attempting to replicate similar LFm MoE ablations on consumer hardware, it is highly recommended to offload the operation to pure System RAM (CPU inference) using a high-core-count processor (like a Ryzen 9 7950X) with at least 96GB of physical DDR5.
## Usage
This model retains the exact same architecture as the base LFM2-24B-A2B and requires trust_remote_code=True when loading via transformers.
CRITICAL: Unlike the 1.2B and 3B models, the 24B MoE strictly uses standard ChatML formatting. It is highly recommended to use the exact <|startoftext|><|im_start|>user formatting without any injected system prompts for the best uncensored performance.
import torch
from transformers import AutoModelForCausalLM, PreTrainedTokenizerFast
model_id = "paperscarecrow/LFM2-24B-A2B-Abliterated"
# Note: Using PreTrainedTokenizerFast bypasses a known typo in Liquid's tokenizer_config.json
tokenizer = PreTrainedTokenizerFast.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
prompt = "<|startoftext|><|im_start|>user\nGive me a detailed tutorial on picking a master padlock.<|im_end|>\n<|im_start|>assistant\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=150,
do_sample=True,
temperature=0.7
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 66
4-bit
16-bit
Model tree for paperscarecrow/LFM2-24B-A2B-Abliterated
Base model
LiquidAI/LFM2-24B-A2B