Spaces:

harismlnaslm
/

Textilindo-AI

Sleeping

App Files Files Community

harismlnaslm commited on Oct 25

Commit

701eb48

1 Parent(s): e207dc8

Add pure API-based training system with GPU support and background processing

Browse files

Files changed (5) hide show

API_DOCUMENTATION.md +238 -0
TRAINING_GUIDE.md +210 -0
app.py +280 -1
quick_train.py +181 -0
training_api.py +438 -0

API_DOCUMENTATION.md ADDED Viewed

	@@ -0,0 +1,238 @@

+# 🤖 Textilindo AI Training API Documentation
+## 🚀 Pure API-Based Training System
+This is a complete API-based training system that uses your data, configs, and the free GPU tier on Hugging Face Spaces.
+## 📡 API Endpoints
+### 1. **Start Training**
+```bash
+POST /api/train/start
+```
+**Request Body:**
+```json
+{
+  "model_name": "distilgpt2",
+  "dataset_path": "data/lora_dataset_20250829_113330.jsonl",
+  "config_path": "configs/training_config.yaml",
+  "max_samples": 10,
+  "epochs": 1,
+  "batch_size": 1,
+  "learning_rate": 5e-5
+}
+```
+**Response:**
+```json
+{
+  "success": true,
+  "message": "Training started successfully",
+  "training_id": "train_20241025_120000",
+  "status": "started"
+}
+```
+### 2. **Check Training Status**
+```bash
+GET /api/train/status
+```
+**Response:**
+```json
+{
+  "is_training": true,
+  "progress": 45,
+  "status": "training",
+  "current_step": 5,
+  "total_steps": 10,
+  "loss": 2.34,
+  "start_time": "2024-10-25T12:00:00",
+  "error": null
+}
+```
+### 3. **Get Training Data Info**
+```bash
+GET /api/train/data
+```
+**Response:**
+```json
+{
+  "files": [
+    {
+      "name": "lora_dataset_20250829_113330.jsonl",
+      "size": 12345,
+      "lines": 33
+    }
+  ],
+  "count": 4
+}
+```
+### 4. **Check GPU Availability**
+```bash
+GET /api/train/gpu
+```
+**Response:**
+```json
+{
+  "available": true,
+  "count": 1,
+  "name": "Tesla T4",
+  "memory_gb": 15.0
+}
+```
+### 5. **Test Trained Model**
+```bash
+POST /api/train/test
+```
+**Response:**
+```json
+{
+  "success": true,
+  "test_prompt": "Question: dimana lokasi textilindo? Answer:",
+  "response": "Question: dimana lokasi textilindo? Answer: Textilindo berkantor pusat di Jl. Raya Prancis No.39, Kosambi Tim., Kec. Kosambi, Kabupaten Tangerang, Banten 15213",
+  "model_path": "./models/textilindo-trained"
+}
+```
+## 🧪 Testing the API
+### 1. **Check GPU Availability**
+```bash
+curl "https://harismlnaslm-Textilindo-AI.hf.space/api/train/gpu"
+```
+### 2. **View Training Data**
+```bash
+curl "https://harismlnaslm-Textilindo-AI.hf.space/api/train/data"
+```
+### 3. **Start Training**
+```bash
+curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/api/train/start" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model_name": "distilgpt2",
+    "dataset_path": "data/lora_dataset_20250829_113330.jsonl",
+    "config_path": "configs/training_config.yaml",
+    "max_samples": 10,
+    "epochs": 1,
+    "batch_size": 1,
+    "learning_rate": 5e-5
+  }'
+```
+### 4. **Monitor Training Progress**
+```bash
+curl "https://harismlnaslm-Textilindo-AI.hf.space/api/train/status"
+```
+### 5. **Test Trained Model**
+```bash
+curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/api/train/test"
+```
+## 🔧 Training Configuration
+### Available Models:
+- `distilgpt2` (82M) - Small, fast, good for free tier
+- `gpt2` (124M) - Original GPT-2
+- `microsoft/DialoGPT-small` (117M) - Conversational
+### Training Parameters:
+- **max_samples**: Limit training data (10 for free tier)
+- **epochs**: Number of training epochs (1-3 recommended)
+- **batch_size**: Batch size (1 for free tier)
+- **learning_rate**: Learning rate (5e-5 recommended)
+## 🎯 Training Process
+1. **Start Training**: POST to `/api/train/start`
+2. **Monitor Progress**: GET `/api/train/status`
+3. **Check GPU Usage**: GET `/api/train/gpu`
+4. **Test Model**: POST `/api/train/test`
+## 📊 Training Status Values
+- `idle` - No training
+- `starting` - Training initialization
+- `training` - Active training
+- `completed` - Training finished
+- `failed` - Training error
+- `stopped` - Training stopped
+## ⚡ GPU Usage
+The API automatically detects and uses GPU if available:
+- **GPU Available**: Uses GPU with fp16 precision
+- **CPU Only**: Falls back to CPU training
+- **Memory Optimization**: Adjusts batch size based on available memory
+## 🔍 Error Handling
+### Common Errors:
+- `400` - Training already in progress
+- `404` - Dataset or config file not found
+- `500` - Training failed (check logs)
+### Error Response:
+```json
+{
+  "detail": "Training already in progress"
+}
+```
+## 📈 Training Monitoring
+### Real-time Status:
+- **Progress**: 0-100%
+- **Current Step**: Current training step
+- **Total Steps**: Total training steps
+- **Loss**: Current training loss
+- **GPU Usage**: GPU memory and utilization
+### Training Logs:
+Check the space logs for detailed training information.
+## 🚀 Quick Start Example
+```bash
+# 1. Check GPU
+curl "https://harismlnaslm-Textilindo-AI.hf.space/api/train/gpu"
+# 2. Start training
+curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/api/train/start" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model_name": "distilgpt2",
+    "dataset_path": "data/lora_dataset_20250829_113330.jsonl",
+    "max_samples": 5,
+    "epochs": 1
+  }'
+# 3. Monitor progress
+curl "https://harismlnaslm-Textilindo-AI.hf.space/api/train/status"
+# 4. Test when complete
+curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/api/train/test"
+```
+## 🎉 Success Indicators
+- ✅ Training starts without errors
+- ✅ GPU is detected and used
+- ✅ Progress increases over time
+- ✅ Model saves to `./models/textilindo-trained`
+- ✅ Test endpoint returns valid responses
+- ✅ Chat interface works with trained model
+---
+*Pure API training system - No HTML interfaces! 🚀*

TRAINING_GUIDE.md ADDED Viewed

	@@ -0,0 +1,210 @@

+# 🤖 Textilindo AI Training Guide for Hugging Face Spaces
+## 🚀 Training Options on Hugging Face Spaces
+### Option 1: **Quick Training (Recommended for HF Spaces)**
+Use the lightweight training script designed for HF Spaces constraints.
+**Access Training Interface:**
+- Visit: `https://harismlnaslm-Textilindo-AI.hf.space/train`
+- Click "Start Lightweight Training"
+- Monitor progress in the training log
+**Manual Training:**
+```bash
+python quick_train.py
+```
+### Option 2: **Use Existing Scripts**
+Run the full training scripts (may be resource-intensive):
+```bash
+# Check if training is ready
+python scripts/check_training_ready.py
+# Run lightweight training
+python scripts/train_textilindo_ai_optimized.py
+# Test the trained model
+python scripts/test_textilindo_ai.py
+```
+### Option 3: **External Training + Upload**
+Train on external resources and upload the model:
+1. **Train locally or on cloud:**
+   ```bash
+   python scripts/train_textilindo_ai.py
+   ```
+2. **Upload trained model to HF Hub:**
+   ```bash
+   huggingface-cli upload your-username/textilindo-trained-model ./models/trained-model
+   ```
+3. **Use the uploaded model in your space**
+## 🔧 Training Configuration
+### For HF Spaces (Limited Resources):
+- **Model**: `distilgpt2` (small, fast)
+- **Batch Size**: 1
+- **Epochs**: 1
+- **Max Length**: 128 tokens
+- **Training Time**: ~5 minutes
+### For External Training (Full Resources):
+- **Model**: `meta-llama/Llama-3.1-8B-Instruct`
+- **Batch Size**: 4-8
+- **Epochs**: 3
+- **Max Length**: 2048 tokens
+- **Training Time**: Hours
+## 📊 Training Data
+Your space includes these training datasets:
+- `data/lora_dataset_20250829_113330.jsonl` (33 samples)
+- `data/lora_dataset_20250910_145055.jsonl`
+- `data/textilindo_training_data.jsonl`
+- `data/training_data.jsonl`
+## 🎯 Training Endpoints
+### Web Interface:
+- **Training UI**: `/train`
+- **Start Training**: `POST /train/start`
+- **Check Status**: `GET /train/status`
+- **View Data**: `GET /train/data`
+### API Usage:
+```bash
+# Start training
+curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/train/start"
+# Check resources
+curl "https://harismlnaslm-Textilindo-AI.hf.space/train/status"
+# View training data
+curl "https://harismlnaslm-Textilindo-AI.hf.space/train/data"
+```
+## ⚠️ Limitations of HF Spaces Training
+### Resource Constraints:
+- **CPU Only**: No GPU acceleration
+- **Memory**: Limited to ~4GB RAM
+- **Time**: 5-minute timeout for training
+- **Storage**: Limited disk space
+### Recommended Approach:
+1. **Quick Demo Training**: Use `quick_train.py` for testing
+2. **Full Training**: Use external resources (Google Colab, AWS, etc.)
+3. **Model Upload**: Upload pre-trained models to HF Hub
+## 🚀 External Training Options
+### Google Colab (Free GPU):
+```python
+# Upload your training data
+# Run: python scripts/train_textilindo_ai.py
+# Download trained model
+# Upload to HF Hub
+```
+### Local Training:
+```bash
+# Setup environment
+python scripts/setup_textilindo_training.py
+# Download model
+python scripts/download_model.py
+# Run training
+python scripts/train_textilindo_ai.py
+# Test model
+python scripts/test_textilindo_ai.py
+```
+### Cloud Training (AWS/GCP):
+```bash
+# Use the monitoring script
+python scripts/train_with_monitoring.py
+```
+## 📈 Training Progress Monitoring
+### On HF Spaces:
+- Check the training log in the web interface
+- Use `/train/status` endpoint for resource monitoring
+### External Training:
+```bash
+# Use monitoring script
+python scripts/train_with_monitoring.py
+# Check logs
+tail -f logs/training.log
+```
+## 🧪 Testing Trained Models
+### Quick Test:
+```bash
+python quick_train.py  # Includes testing
+```
+### Full Testing:
+```bash
+python scripts/test_textilindo_ai.py
+python scripts/test_model.py
+```
+### API Testing:
+```bash
+# Test chat endpoint
+curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/chat" \
+  -H "Content-Type: application/json" \
+  -d '{"message": "dimana lokasi textilindo?"}'
+```
+## 🔧 Troubleshooting
+### Common Issues:
+1. **"Out of Memory"**
+   - Use smaller models (distilgpt2)
+   - Reduce batch size
+   - Use external training
+2. **"Training Timeout"**
+   - HF Spaces has 5-minute limit
+   - Use external resources for full training
+3. **"Model Not Found"**
+   - Check if model is downloaded
+   - Use `python scripts/download_model.py`
+4. **"Data Not Found"**
+   - Verify data files exist in `data/` directory
+   - Check file permissions
+## 📚 Next Steps
+1. **Start with Quick Training**: Test the setup with `quick_train.py`
+2. **Monitor Resources**: Use `/train/status` to check available resources
+3. **External Training**: For full training, use external resources
+4. **Model Upload**: Upload trained models to Hugging Face Hub
+5. **Integration**: Use uploaded models in your space
+## 🎉 Success Indicators
+- ✅ Training completes without errors
+- ✅ Model saves to `./models/` directory
+- ✅ Test responses are generated
+- ✅ Chat interface works with trained model
+- ✅ API endpoints respond correctly
+---
+*Happy Training! 🚀*

app.py CHANGED Viewed

@@ -9,7 +9,7 @@ import json
 import logging
 from pathlib import Path
 from typing import Optional, Dict, Any
-from fastapi import FastAPI, HTTPException, Request
 from fastapi.responses import HTMLResponse, JSONResponse
 from fastapi.staticfiles import StaticFiles
 from fastapi.middleware.cors import CORSMiddleware
@@ -17,6 +17,7 @@ from pydantic import BaseModel
 import uvicorn
 from huggingface_hub import InferenceClient
 import requests
 # Setup logging
 logging.basicConfig(level=logging.INFO)
@@ -272,6 +273,284 @@ async def get_info():
         "client_initialized": bool(ai_assistant.client)
     }
 # Mount static files if they exist
 if Path("static").exists():
     app.mount("/static", StaticFiles(directory="static"), name="static")

 import logging
 from pathlib import Path
 from typing import Optional, Dict, Any
+from fastapi import FastAPI, HTTPException, Request, BackgroundTasks
 from fastapi.responses import HTMLResponse, JSONResponse
 from fastapi.staticfiles import StaticFiles
 from fastapi.middleware.cors import CORSMiddleware
 import uvicorn
 from huggingface_hub import InferenceClient
 import requests
+from datetime import datetime
 # Setup logging
 logging.basicConfig(level=logging.INFO)
         "client_initialized": bool(ai_assistant.client)
     }
+# Import training API
+from training_api import (
+    TrainingRequest, TrainingResponse, training_status,
+    train_model_async, load_training_config, load_training_data, check_gpu_availability
+)
+# Training API endpoints
+@app.post("/api/train/start", response_model=TrainingResponse)
+async def start_training_api(request: TrainingRequest, background_tasks: BackgroundTasks):
+    """Start training process via API"""
+    if training_status["is_training"]:
+        raise HTTPException(status_code=400, detail="Training already in progress")
+    # Validate inputs
+    if not Path(request.dataset_path).exists():
+        raise HTTPException(status_code=404, detail=f"Dataset not found: {request.dataset_path}")
+    if not Path(request.config_path).exists():
+        raise HTTPException(status_code=404, detail=f"Config not found: {request.config_path}")
+    # Start training in background
+    training_id = f"train_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
+    background_tasks.add_task(
+        train_model_async,
+        request.model_name,
+        request.dataset_path,
+        request.config_path,
+        request.max_samples,
+        request.epochs,
+        request.batch_size,
+        request.learning_rate
+    )
+    return TrainingResponse(
+        success=True,
+        message="Training started successfully",
+        training_id=training_id,
+        status="started"
+    )
+@app.get("/api/train/status")
+async def get_training_status_api():
+    """Get current training status"""
+    return training_status
+@app.get("/api/train/data")
+async def get_training_data_info_api():
+    """Get information about available training data"""
+    data_dir = Path("data")
+    if not data_dir.exists():
+        return {"files": [], "count": 0}
+    jsonl_files = list(data_dir.glob("*.jsonl"))
+    files_info = []
+    for file in jsonl_files:
+        try:
+            with open(file, 'r', encoding='utf-8') as f:
+                lines = f.readlines()
+            files_info.append({
+                "name": file.name,
+                "size": file.stat().st_size,
+                "lines": len(lines)
+            })
+        except Exception as e:
+            files_info.append({
+                "name": file.name,
+                "error": str(e)
+            })
+    return {
+        "files": files_info,
+        "count": len(jsonl_files)
+    }
+@app.get("/api/train/gpu")
+async def get_gpu_info_api():
+    """Get GPU information"""
+    try:
+        import torch
+        gpu_available = torch.cuda.is_available()
+        if gpu_available:
+            gpu_count = torch.cuda.device_count()
+            gpu_name = torch.cuda.get_device_name(0)
+            gpu_memory = torch.cuda.get_device_properties(0).total_memory / (1024**3)
+            return {
+                "available": True,
+                "count": gpu_count,
+                "name": gpu_name,
+                "memory_gb": round(gpu_memory, 2)
+            }
+        else:
+            return {"available": False}
+    except Exception as e:
+        return {"error": str(e)}
+@app.post("/api/train/test")
+async def test_trained_model_api():
+    """Test the trained model"""
+    model_path = "./models/textilindo-trained"
+    if not Path(model_path).exists():
+        return {"error": "No trained model found"}
+    try:
+        from transformers import AutoTokenizer, AutoModelForCausalLM
+        import torch
+        tokenizer = AutoTokenizer.from_pretrained(model_path)
+        model = AutoModelForCausalLM.from_pretrained(model_path)
+        # Test prompt
+        test_prompt = "Question: dimana lokasi textilindo? Answer:"
+        inputs = tokenizer(test_prompt, return_tensors="pt")
+        with torch.no_grad():
+            outputs = model.generate(
+                **inputs,
+                max_length=inputs.input_ids.shape[1] + 30,
+                temperature=0.7,
+                do_sample=True,
+                pad_token_id=tokenizer.eos_token_id
+            )
+        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+        return {
+            "success": True,
+            "test_prompt": test_prompt,
+            "response": response,
+            "model_path": model_path
+        }
+    except Exception as e:
+        return {"error": str(e)}
+# Legacy training endpoints (for backward compatibility)
+@app.get("/train")
+async def training_interface():
+    """Training interface"""
+    try:
+        with open("templates/training.html", "r", encoding="utf-8") as f:
+            return HTMLResponse(content=f.read())
+    except FileNotFoundError:
+        return HTMLResponse(content="""
+        <!DOCTYPE html>
+        <html>
+        <head>
+            <title>Textilindo AI Training</title>
+            <meta charset="utf-8">
+            <style>
+                body { font-family: Arial, sans-serif; max-width: 800px; margin: 0 auto; padding: 20px; }
+                .container { background: #f5f5f5; padding: 20px; border-radius: 10px; margin: 20px 0; }
+                button { background: #2196f3; color: white; border: none; padding: 10px 20px; border-radius: 5px; cursor: pointer; }
+                button:hover { background: #1976d2; }
+                .log { background: #000; color: #0f0; padding: 10px; border-radius: 5px; font-family: monospace; height: 300px; overflow-y: auto; }
+            </style>
+        </head>
+        <body>
+            <h1>🤖 Textilindo AI Training Interface</h1>
+            <div class="container">
+                <h2>Training Options</h2>
+                <p>Choose your training method:</p>
+                <button onclick="startLightweightTraining()">Start Lightweight Training</button>
+                <button onclick="checkResources()">Check Resources</button>
+                <button onclick="viewData()">View Training Data</button>
+            </div>
+            <div class="container">
+                <h2>Training Log</h2>
+                <div id="log" class="log">Ready to start training...</div>
+            </div>
+            <script>
+                function addLog(message) {
+                    const log = document.getElementById('log');
+                    const timestamp = new Date().toLocaleTimeString();
+                    log.innerHTML += `[${timestamp}] ${message}\\n`;
+                    log.scrollTop = log.scrollHeight;
+                }
+                async function startLightweightTraining() {
+                    addLog('Starting lightweight training...');
+                    try {
+                        const response = await fetch('/train/start', {
+                            method: 'POST',
+                            headers: { 'Content-Type': 'application/json' }
+                        });
+                        const result = await response.json();
+                        addLog(`Training result: ${result.message}`);
+                    } catch (error) {
+                        addLog(`Error: ${error.message}`);
+                    }
+                }
+                async function checkResources() {
+                    addLog('Checking resources...');
+                    try {
+                        const response = await fetch('/train/status');
+                        const result = await response.json();
+                        addLog(`Resources: ${JSON.stringify(result, null, 2)}`);
+                    } catch (error) {
+                        addLog(`Error: ${error.message}`);
+                    }
+                }
+                async function viewData() {
+                    addLog('Loading training data...');
+                    try {
+                        const response = await fetch('/train/data');
+                        const result = await response.json();
+                        addLog(`Data files: ${result.files.join(', ')}`);
+                    } catch (error) {
+                        addLog(`Error: ${error.message}`);
+                    }
+                }
+            </script>
+        </body>
+        </html>
+        """)
+@app.post("/train/start")
+async def start_training():
+    """Start lightweight training"""
+    try:
+        # Import training script
+        import subprocess
+        import sys
+        # Run the training script
+        result = subprocess.run([
+            sys.executable, "train_on_space.py"
+        ], capture_output=True, text=True, timeout=300)  # 5 minute timeout
+        if result.returncode == 0:
+            return {"message": "Training completed successfully!", "output": result.stdout}
+        else:
+            return {"message": "Training failed", "error": result.stderr}
+    except subprocess.TimeoutExpired:
+        return {"message": "Training timed out (5 minutes limit)"}
+    except Exception as e:
+        return {"message": f"Training error: {str(e)}"}
+@app.get("/train/status")
+async def training_status():
+    """Get training status and resources"""
+    try:
+        import psutil
+        return {
+            "status": "ready",
+            "cpu_count": psutil.cpu_count(),
+            "memory_total_gb": round(psutil.virtual_memory().total / (1024**3), 2),
+            "memory_available_gb": round(psutil.virtual_memory().available / (1024**3), 2),
+            "disk_free_gb": round(psutil.disk_usage('.').free / (1024**3), 2)
+        }
+    except Exception as e:
+        return {"status": "error", "message": str(e)}
+@app.get("/train/data")
+async def training_data():
+    """Get training data information"""
+    try:
+        data_dir = Path("data")
+        if data_dir.exists():
+            jsonl_files = list(data_dir.glob("*.jsonl"))
+            return {
+                "files": [f.name for f in jsonl_files],
+                "count": len(jsonl_files)
+            }
+        else:
+            return {"files": [], "count": 0}
+    except Exception as e:
+        return {"error": str(e)}
 # Mount static files if they exist
 if Path("static").exists():
     app.mount("/static", StaticFiles(directory="static"), name="static")

quick_train.py ADDED Viewed

	@@ -0,0 +1,181 @@

+#!/usr/bin/env python3
+"""
+Quick training script for Hugging Face Spaces
+Optimized for CPU-only training with limited resources
+"""
+import os
+import json
+import logging
+from pathlib import Path
+from datetime import datetime
+# Setup logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+def quick_training():
+    """Quick training suitable for HF Spaces"""
+    print("🚀 Starting Quick Training for Hugging Face Spaces")
+    print("=" * 60)
+    try:
+        # Import required libraries
+        from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
+        from datasets import Dataset
+        import torch
+        print("✅ Successfully imported training libraries")
+        # Use a very small model for HF Spaces
+        model_name = "distilgpt2"  # Small, fast model
+        print(f"📥 Loading model: {model_name}")
+        # Load tokenizer and model
+        tokenizer = AutoTokenizer.from_pretrained(model_name)
+        if tokenizer.pad_token is None:
+            tokenizer.pad_token = tokenizer.eos_token
+        model = AutoModelForCausalLM.from_pretrained(model_name)
+        print("✅ Model loaded successfully")
+        # Load training data (limit to small amount for HF Spaces)
+        data_file = Path("data/lora_dataset_20250829_113330.jsonl")
+        if not data_file.exists():
+            print("❌ Training data not found")
+            return False
+        # Load and prepare data
+        training_data = []
+        with open(data_file, 'r', encoding='utf-8') as f:
+            for i, line in enumerate(f):
+                if i >= 5:  # Limit to 5 samples for quick training
+                    break
+                if line.strip():
+                    data = json.loads(line)
+                    # Create simple training text
+                    text = f"Question: {data.get('instruction', '')} Answer: {data.get('output', '')}"
+                    training_data.append({"text": text})
+        print(f"✅ Loaded {len(training_data)} training samples")
+        if not training_data:
+            print("❌ No training data found")
+            return False
+        # Convert to dataset
+        dataset = Dataset.from_list(training_data)
+        def tokenize_function(examples):
+            return tokenizer(
+                examples["text"],
+                truncation=True,
+                padding=True,
+                max_length=128  # Short sequences for quick training
+            )
+        tokenized_dataset = dataset.map(tokenize_function, batched=True)
+        # Training arguments optimized for HF Spaces
+        training_args = TrainingArguments(
+            output_dir="./models/quick-trained",
+            num_train_epochs=1,  # Single epoch
+            per_device_train_batch_size=1,  # Small batch
+            gradient_accumulation_steps=2,
+            learning_rate=5e-5,
+            warmup_steps=2,
+            save_steps=10,
+            logging_steps=1,
+            save_total_limit=1,
+            prediction_loss_only=True,
+            remove_unused_columns=False,
+            fp16=False,  # Disable fp16 for CPU
+            dataloader_pin_memory=False,
+            report_to=None,  # Disable wandb/tensorboard
+        )
+        # Create trainer
+        trainer = Trainer(
+            model=model,
+            args=training_args,
+            train_dataset=tokenized_dataset,
+            tokenizer=tokenizer,
+        )
+        print("🚀 Starting training...")
+        print("⚠️  This is a quick demo training with limited data")
+        # Train
+        trainer.train()
+        # Save the model
+        model.save_pretrained("./models/quick-trained")
+        tokenizer.save_pretrained("./models/quick-trained")
+        print("✅ Quick training completed successfully!")
+        print("📁 Model saved to: ./models/quick-trained")
+        # Test the model
+        print("\n🧪 Testing the trained model...")
+        test_prompt = "Question: dimana lokasi textilindo? Answer:"
+        inputs = tokenizer(test_prompt, return_tensors="pt")
+        with torch.no_grad():
+            outputs = model.generate(
+                **inputs,
+                max_length=inputs.input_ids.shape[1] + 20,
+                temperature=0.7,
+                do_sample=True,
+                pad_token_id=tokenizer.eos_token_id
+            )
+        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+        print(f"📝 Test response: {response}")
+        return True
+    except ImportError as e:
+        print(f"❌ Missing required library: {e}")
+        print("💡 Install with: pip install transformers datasets torch")
+        return False
+    except Exception as e:
+        print(f"❌ Training failed: {e}")
+        return False
+def main():
+    """Main function"""
+    print("🤖 Textilindo AI - Quick Training on Hugging Face Spaces")
+    print("=" * 70)
+    # Check if we're on HF Spaces
+    if os.getenv('SPACE_ID'):
+        print("✅ Running on Hugging Face Spaces")
+    else:
+        print("⚠️  Not running on Hugging Face Spaces")
+    # Check available data
+    data_dir = Path("data")
+    if data_dir.exists():
+        jsonl_files = list(data_dir.glob("*.jsonl"))
+        print(f"📊 Found {len(jsonl_files)} training data files")
+        for file in jsonl_files:
+            print(f"  - {file.name}")
+    else:
+        print("❌ No data directory found")
+        return 1
+    # Run quick training
+    if quick_training():
+        print("\n🎉 Quick training completed successfully!")
+        print("📋 Next steps:")
+        print("1. Check the trained model in ./models/quick-trained/")
+        print("2. Test the model with your chat interface")
+        print("3. For full training, use external resources")
+        return 0
+    else:
+        print("\n❌ Quick training failed")
+        return 1
+if __name__ == "__main__":
+    import sys
+    sys.exit(main())

training_api.py ADDED Viewed

	@@ -0,0 +1,438 @@

+#!/usr/bin/env python3
+"""
+Textilindo AI Training API
+Pure API-based training system for Hugging Face Spaces
+Uses free GPU tier and your training data/configs
+"""
+import os
+import json
+import yaml
+import logging
+import torch
+from pathlib import Path
+from datetime import datetime
+from typing import Dict, Any, Optional
+from fastapi import FastAPI, HTTPException, BackgroundTasks
+from pydantic import BaseModel
+import uvicorn
+# Setup logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# Training API
+training_app = FastAPI(title="Textilindo AI Training API")
+# Training status storage
+training_status = {
+    "is_training": False,
+    "progress": 0,
+    "status": "idle",
+    "current_step": 0,
+    "total_steps": 0,
+    "loss": 0.0,
+    "start_time": None,
+    "end_time": None,
+    "error": None
+}
+class TrainingRequest(BaseModel):
+    model_name: str = "distilgpt2"  # Start with small model
+    dataset_path: str = "data/lora_dataset_20250829_113330.jsonl"
+    config_path: str = "configs/training_config.yaml"
+    max_samples: int = 10  # Limit for free tier
+    epochs: int = 1
+    batch_size: int = 1
+    learning_rate: float = 5e-5
+class TrainingResponse(BaseModel):
+    success: bool
+    message: str
+    training_id: str
+    status: str
+def load_training_config(config_path: str) -> Dict[str, Any]:
+    """Load training configuration"""
+    try:
+        with open(config_path, 'r') as f:
+            config = yaml.safe_load(f)
+        return config
+    except Exception as e:
+        logger.error(f"Error loading config: {e}")
+        return {}
+def load_training_data(dataset_path: str, max_samples: int = 10) -> list:
+    """Load training data from JSONL file"""
+    data = []
+    try:
+        with open(dataset_path, 'r', encoding='utf-8') as f:
+            for i, line in enumerate(f):
+                if i >= max_samples:
+                    break
+                if line.strip():
+                    item = json.loads(line)
+                    # Create training text
+                    instruction = item.get('instruction', '')
+                    output = item.get('output', '')
+                    text = f"Question: {instruction} Answer: {output}"
+                    data.append({"text": text})
+        logger.info(f"Loaded {len(data)} training samples")
+        return data
+    except Exception as e:
+        logger.error(f"Error loading data: {e}")
+        return []
+def check_gpu_availability() -> bool:
+    """Check if GPU is available"""
+    try:
+        if torch.cuda.is_available():
+            gpu_count = torch.cuda.device_count()
+            gpu_name = torch.cuda.get_device_name(0)
+            logger.info(f"GPU available: {gpu_name} (Count: {gpu_count})")
+            return True
+        else:
+            logger.info("No GPU available, using CPU")
+            return False
+    except Exception as e:
+        logger.error(f"Error checking GPU: {e}")
+        return False
+def train_model_async(
+    model_name: str,
+    dataset_path: str,
+    config_path: str,
+    max_samples: int,
+    epochs: int,
+    batch_size: int,
+    learning_rate: float
+):
+    """Async training function"""
+    global training_status
+    try:
+        training_status.update({
+            "is_training": True,
+            "status": "starting",
+            "progress": 0,
+            "start_time": datetime.now().isoformat(),
+            "error": None
+        })
+        logger.info("🚀 Starting training...")
+        # Import training libraries
+        from transformers import (
+            AutoTokenizer,
+            AutoModelForCausalLM,
+            TrainingArguments,
+            Trainer,
+            DataCollatorForLanguageModeling
+        )
+        from datasets import Dataset
+        # Check GPU
+        gpu_available = check_gpu_availability()
+        # Load model and tokenizer
+        logger.info(f"📥 Loading model: {model_name}")
+        tokenizer = AutoTokenizer.from_pretrained(model_name)
+        if tokenizer.pad_token is None:
+            tokenizer.pad_token = tokenizer.eos_token
+        # Load model with GPU if available
+        if gpu_available:
+            model = AutoModelForCausalLM.from_pretrained(
+                model_name,
+                torch_dtype=torch.float16,
+                device_map="auto"
+            )
+        else:
+            model = AutoModelForCausalLM.from_pretrained(model_name)
+        logger.info("✅ Model loaded successfully")
+        # Load training data
+        training_data = load_training_data(dataset_path, max_samples)
+        if not training_data:
+            raise Exception("No training data loaded")
+        # Convert to dataset
+        dataset = Dataset.from_list(training_data)
+        def tokenize_function(examples):
+            return tokenizer(
+                examples["text"],
+                truncation=True,
+                padding=True,
+                max_length=256,
+                return_tensors="pt"
+            )
+        tokenized_dataset = dataset.map(tokenize_function, batched=True)
+        # Training arguments
+        training_args = TrainingArguments(
+            output_dir="./models/textilindo-trained",
+            num_train_epochs=epochs,
+            per_device_train_batch_size=batch_size,
+            gradient_accumulation_steps=2,
+            learning_rate=learning_rate,
+            warmup_steps=5,
+            save_steps=10,
+            logging_steps=1,
+            save_total_limit=1,
+            prediction_loss_only=True,
+            remove_unused_columns=False,
+            fp16=gpu_available,  # Use fp16 only if GPU available
+            dataloader_pin_memory=gpu_available,
+            report_to=None,
+        )
+        # Data collator
+        data_collator = DataCollatorForLanguageModeling(
+            tokenizer=tokenizer,
+            mlm=False,
+        )
+        # Create trainer
+        trainer = Trainer(
+            model=model,
+            args=training_args,
+            train_dataset=tokenized_dataset,
+            data_collator=data_collator,
+            tokenizer=tokenizer,
+        )
+        # Custom callback for progress tracking
+        class ProgressCallback:
+            def __init__(self):
+                self.step = 0
+                self.total_steps = len(tokenized_dataset) * epochs
+            def on_log(self, args, state, control, logs=None, **kwargs):
+                global training_status
+                if logs:
+                    training_status.update({
+                        "current_step": state.global_step,
+                        "total_steps": self.total_steps,
+                        "progress": min(100, (state.global_step / self.total_steps) * 100),
+                        "loss": logs.get('loss', 0.0),
+                        "status": "training"
+                    })
+        # Add callback
+        trainer.add_callback(ProgressCallback())
+        # Start training
+        training_status["status"] = "training"
+        trainer.train()
+        # Save model
+        model.save_pretrained("./models/textilindo-trained")
+        tokenizer.save_pretrained("./models/textilindo-trained")
+        # Update status
+        training_status.update({
+            "is_training": False,
+            "status": "completed",
+            "progress": 100,
+            "end_time": datetime.now().isoformat()
+        })
+        logger.info("✅ Training completed successfully!")
+    except Exception as e:
+        logger.error(f"Training failed: {e}")
+        training_status.update({
+            "is_training": False,
+            "status": "failed",
+            "error": str(e),
+            "end_time": datetime.now().isoformat()
+        })
+# API Endpoints
+@training_app.post("/train/start", response_model=TrainingResponse)
+async def start_training(request: TrainingRequest, background_tasks: BackgroundTasks):
+    """Start training process"""
+    global training_status
+    if training_status["is_training"]:
+        raise HTTPException(status_code=400, detail="Training already in progress")
+    # Validate inputs
+    if not Path(request.dataset_path).exists():
+        raise HTTPException(status_code=404, detail=f"Dataset not found: {request.dataset_path}")
+    if not Path(request.config_path).exists():
+        raise HTTPException(status_code=404, detail=f"Config not found: {request.config_path}")
+    # Start training in background
+    training_id = f"train_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
+    background_tasks.add_task(
+        train_model_async,
+        request.model_name,
+        request.dataset_path,
+        request.config_path,
+        request.max_samples,
+        request.epochs,
+        request.batch_size,
+        request.learning_rate
+    )
+    return TrainingResponse(
+        success=True,
+        message="Training started successfully",
+        training_id=training_id,
+        status="started"
+    )
+@training_app.get("/train/status")
+async def get_training_status():
+    """Get current training status"""
+    return training_status
+@training_app.get("/train/data")
+async def get_training_data_info():
+    """Get information about available training data"""
+    data_dir = Path("data")
+    if not data_dir.exists():
+        return {"files": [], "count": 0}
+    jsonl_files = list(data_dir.glob("*.jsonl"))
+    files_info = []
+    for file in jsonl_files:
+        try:
+            with open(file, 'r', encoding='utf-8') as f:
+                lines = f.readlines()
+            files_info.append({
+                "name": file.name,
+                "size": file.stat().st_size,
+                "lines": len(lines)
+            })
+        except Exception as e:
+            files_info.append({
+                "name": file.name,
+                "error": str(e)
+            })
+    return {
+        "files": files_info,
+        "count": len(jsonl_files)
+    }
+@training_app.get("/train/config")
+async def get_training_config():
+    """Get current training configuration"""
+    config_path = "configs/training_config.yaml"
+    if not Path(config_path).exists():
+        return {"error": "Config file not found"}
+    try:
+        config = load_training_config(config_path)
+        return config
+    except Exception as e:
+        return {"error": str(e)}
+@training_app.get("/train/models")
+async def get_available_models():
+    """Get list of available models"""
+    return {
+        "models": [
+            {
+                "name": "distilgpt2",
+                "size": "82M",
+                "description": "Small, fast model for quick training"
+            },
+            {
+                "name": "gpt2",
+                "size": "124M",
+                "description": "Original GPT-2 model"
+            },
+            {
+                "name": "microsoft/DialoGPT-small",
+                "size": "117M",
+                "description": "Conversational model"
+            }
+        ]
+    }
+@training_app.get("/train/gpu")
+async def get_gpu_info():
+    """Get GPU information"""
+    try:
+        gpu_available = torch.cuda.is_available()
+        if gpu_available:
+            gpu_count = torch.cuda.device_count()
+            gpu_name = torch.cuda.get_device_name(0)
+            gpu_memory = torch.cuda.get_device_properties(0).total_memory / (1024**3)
+            return {
+                "available": True,
+                "count": gpu_count,
+                "name": gpu_name,
+                "memory_gb": round(gpu_memory, 2)
+            }
+        else:
+            return {"available": False}
+    except Exception as e:
+        return {"error": str(e)}
+@training_app.post("/train/stop")
+async def stop_training():
+    """Stop current training"""
+    global training_status
+    if not training_status["is_training"]:
+        return {"message": "No training in progress"}
+    training_status.update({
+        "is_training": False,
+        "status": "stopped",
+        "end_time": datetime.now().isoformat()
+    })
+    return {"message": "Training stopped"}
+@training_app.get("/train/test")
+async def test_trained_model():
+    """Test the trained model"""
+    model_path = "./models/textilindo-trained"
+    if not Path(model_path).exists():
+        return {"error": "No trained model found"}
+    try:
+        from transformers import AutoTokenizer, AutoModelForCausalLM
+        tokenizer = AutoTokenizer.from_pretrained(model_path)
+        model = AutoModelForCausalLM.from_pretrained(model_path)
+        # Test prompt
+        test_prompt = "Question: dimana lokasi textilindo? Answer:"
+        inputs = tokenizer(test_prompt, return_tensors="pt")
+        with torch.no_grad():
+            outputs = model.generate(
+                **inputs,
+                max_length=inputs.input_ids.shape[1] + 30,
+                temperature=0.7,
+                do_sample=True,
+                pad_token_id=tokenizer.eos_token_id
+            )
+        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+        return {
+            "success": True,
+            "test_prompt": test_prompt,
+            "response": response,
+            "model_path": model_path
+        }
+    except Exception as e:
+        return {"error": str(e)}
+if __name__ == "__main__":
+    uvicorn.run(training_app, host="0.0.0.0", port=7861)