harismlnaslm commited on
Commit
701eb48
·
1 Parent(s): e207dc8

Add pure API-based training system with GPU support and background processing

Browse files
Files changed (5) hide show
  1. API_DOCUMENTATION.md +238 -0
  2. TRAINING_GUIDE.md +210 -0
  3. app.py +280 -1
  4. quick_train.py +181 -0
  5. training_api.py +438 -0
API_DOCUMENTATION.md ADDED
@@ -0,0 +1,238 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🤖 Textilindo AI Training API Documentation
2
+
3
+ ## 🚀 Pure API-Based Training System
4
+
5
+ This is a complete API-based training system that uses your data, configs, and the free GPU tier on Hugging Face Spaces.
6
+
7
+ ## 📡 API Endpoints
8
+
9
+ ### 1. **Start Training**
10
+ ```bash
11
+ POST /api/train/start
12
+ ```
13
+
14
+ **Request Body:**
15
+ ```json
16
+ {
17
+ "model_name": "distilgpt2",
18
+ "dataset_path": "data/lora_dataset_20250829_113330.jsonl",
19
+ "config_path": "configs/training_config.yaml",
20
+ "max_samples": 10,
21
+ "epochs": 1,
22
+ "batch_size": 1,
23
+ "learning_rate": 5e-5
24
+ }
25
+ ```
26
+
27
+ **Response:**
28
+ ```json
29
+ {
30
+ "success": true,
31
+ "message": "Training started successfully",
32
+ "training_id": "train_20241025_120000",
33
+ "status": "started"
34
+ }
35
+ ```
36
+
37
+ ### 2. **Check Training Status**
38
+ ```bash
39
+ GET /api/train/status
40
+ ```
41
+
42
+ **Response:**
43
+ ```json
44
+ {
45
+ "is_training": true,
46
+ "progress": 45,
47
+ "status": "training",
48
+ "current_step": 5,
49
+ "total_steps": 10,
50
+ "loss": 2.34,
51
+ "start_time": "2024-10-25T12:00:00",
52
+ "error": null
53
+ }
54
+ ```
55
+
56
+ ### 3. **Get Training Data Info**
57
+ ```bash
58
+ GET /api/train/data
59
+ ```
60
+
61
+ **Response:**
62
+ ```json
63
+ {
64
+ "files": [
65
+ {
66
+ "name": "lora_dataset_20250829_113330.jsonl",
67
+ "size": 12345,
68
+ "lines": 33
69
+ }
70
+ ],
71
+ "count": 4
72
+ }
73
+ ```
74
+
75
+ ### 4. **Check GPU Availability**
76
+ ```bash
77
+ GET /api/train/gpu
78
+ ```
79
+
80
+ **Response:**
81
+ ```json
82
+ {
83
+ "available": true,
84
+ "count": 1,
85
+ "name": "Tesla T4",
86
+ "memory_gb": 15.0
87
+ }
88
+ ```
89
+
90
+ ### 5. **Test Trained Model**
91
+ ```bash
92
+ POST /api/train/test
93
+ ```
94
+
95
+ **Response:**
96
+ ```json
97
+ {
98
+ "success": true,
99
+ "test_prompt": "Question: dimana lokasi textilindo? Answer:",
100
+ "response": "Question: dimana lokasi textilindo? Answer: Textilindo berkantor pusat di Jl. Raya Prancis No.39, Kosambi Tim., Kec. Kosambi, Kabupaten Tangerang, Banten 15213",
101
+ "model_path": "./models/textilindo-trained"
102
+ }
103
+ ```
104
+
105
+ ## 🧪 Testing the API
106
+
107
+ ### 1. **Check GPU Availability**
108
+ ```bash
109
+ curl "https://harismlnaslm-Textilindo-AI.hf.space/api/train/gpu"
110
+ ```
111
+
112
+ ### 2. **View Training Data**
113
+ ```bash
114
+ curl "https://harismlnaslm-Textilindo-AI.hf.space/api/train/data"
115
+ ```
116
+
117
+ ### 3. **Start Training**
118
+ ```bash
119
+ curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/api/train/start" \
120
+ -H "Content-Type: application/json" \
121
+ -d '{
122
+ "model_name": "distilgpt2",
123
+ "dataset_path": "data/lora_dataset_20250829_113330.jsonl",
124
+ "config_path": "configs/training_config.yaml",
125
+ "max_samples": 10,
126
+ "epochs": 1,
127
+ "batch_size": 1,
128
+ "learning_rate": 5e-5
129
+ }'
130
+ ```
131
+
132
+ ### 4. **Monitor Training Progress**
133
+ ```bash
134
+ curl "https://harismlnaslm-Textilindo-AI.hf.space/api/train/status"
135
+ ```
136
+
137
+ ### 5. **Test Trained Model**
138
+ ```bash
139
+ curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/api/train/test"
140
+ ```
141
+
142
+ ## 🔧 Training Configuration
143
+
144
+ ### Available Models:
145
+ - `distilgpt2` (82M) - Small, fast, good for free tier
146
+ - `gpt2` (124M) - Original GPT-2
147
+ - `microsoft/DialoGPT-small` (117M) - Conversational
148
+
149
+ ### Training Parameters:
150
+ - **max_samples**: Limit training data (10 for free tier)
151
+ - **epochs**: Number of training epochs (1-3 recommended)
152
+ - **batch_size**: Batch size (1 for free tier)
153
+ - **learning_rate**: Learning rate (5e-5 recommended)
154
+
155
+ ## 🎯 Training Process
156
+
157
+ 1. **Start Training**: POST to `/api/train/start`
158
+ 2. **Monitor Progress**: GET `/api/train/status`
159
+ 3. **Check GPU Usage**: GET `/api/train/gpu`
160
+ 4. **Test Model**: POST `/api/train/test`
161
+
162
+ ## 📊 Training Status Values
163
+
164
+ - `idle` - No training
165
+ - `starting` - Training initialization
166
+ - `training` - Active training
167
+ - `completed` - Training finished
168
+ - `failed` - Training error
169
+ - `stopped` - Training stopped
170
+
171
+ ## ⚡ GPU Usage
172
+
173
+ The API automatically detects and uses GPU if available:
174
+ - **GPU Available**: Uses GPU with fp16 precision
175
+ - **CPU Only**: Falls back to CPU training
176
+ - **Memory Optimization**: Adjusts batch size based on available memory
177
+
178
+ ## 🔍 Error Handling
179
+
180
+ ### Common Errors:
181
+ - `400` - Training already in progress
182
+ - `404` - Dataset or config file not found
183
+ - `500` - Training failed (check logs)
184
+
185
+ ### Error Response:
186
+ ```json
187
+ {
188
+ "detail": "Training already in progress"
189
+ }
190
+ ```
191
+
192
+ ## 📈 Training Monitoring
193
+
194
+ ### Real-time Status:
195
+ - **Progress**: 0-100%
196
+ - **Current Step**: Current training step
197
+ - **Total Steps**: Total training steps
198
+ - **Loss**: Current training loss
199
+ - **GPU Usage**: GPU memory and utilization
200
+
201
+ ### Training Logs:
202
+ Check the space logs for detailed training information.
203
+
204
+ ## 🚀 Quick Start Example
205
+
206
+ ```bash
207
+ # 1. Check GPU
208
+ curl "https://harismlnaslm-Textilindo-AI.hf.space/api/train/gpu"
209
+
210
+ # 2. Start training
211
+ curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/api/train/start" \
212
+ -H "Content-Type: application/json" \
213
+ -d '{
214
+ "model_name": "distilgpt2",
215
+ "dataset_path": "data/lora_dataset_20250829_113330.jsonl",
216
+ "max_samples": 5,
217
+ "epochs": 1
218
+ }'
219
+
220
+ # 3. Monitor progress
221
+ curl "https://harismlnaslm-Textilindo-AI.hf.space/api/train/status"
222
+
223
+ # 4. Test when complete
224
+ curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/api/train/test"
225
+ ```
226
+
227
+ ## 🎉 Success Indicators
228
+
229
+ - ✅ Training starts without errors
230
+ - ✅ GPU is detected and used
231
+ - ✅ Progress increases over time
232
+ - ✅ Model saves to `./models/textilindo-trained`
233
+ - ✅ Test endpoint returns valid responses
234
+ - ✅ Chat interface works with trained model
235
+
236
+ ---
237
+
238
+ *Pure API training system - No HTML interfaces! 🚀*
TRAINING_GUIDE.md ADDED
@@ -0,0 +1,210 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🤖 Textilindo AI Training Guide for Hugging Face Spaces
2
+
3
+ ## 🚀 Training Options on Hugging Face Spaces
4
+
5
+ ### Option 1: **Quick Training (Recommended for HF Spaces)**
6
+ Use the lightweight training script designed for HF Spaces constraints.
7
+
8
+ **Access Training Interface:**
9
+ - Visit: `https://harismlnaslm-Textilindo-AI.hf.space/train`
10
+ - Click "Start Lightweight Training"
11
+ - Monitor progress in the training log
12
+
13
+ **Manual Training:**
14
+ ```bash
15
+ python quick_train.py
16
+ ```
17
+
18
+ ### Option 2: **Use Existing Scripts**
19
+ Run the full training scripts (may be resource-intensive):
20
+
21
+ ```bash
22
+ # Check if training is ready
23
+ python scripts/check_training_ready.py
24
+
25
+ # Run lightweight training
26
+ python scripts/train_textilindo_ai_optimized.py
27
+
28
+ # Test the trained model
29
+ python scripts/test_textilindo_ai.py
30
+ ```
31
+
32
+ ### Option 3: **External Training + Upload**
33
+ Train on external resources and upload the model:
34
+
35
+ 1. **Train locally or on cloud:**
36
+ ```bash
37
+ python scripts/train_textilindo_ai.py
38
+ ```
39
+
40
+ 2. **Upload trained model to HF Hub:**
41
+ ```bash
42
+ huggingface-cli upload your-username/textilindo-trained-model ./models/trained-model
43
+ ```
44
+
45
+ 3. **Use the uploaded model in your space**
46
+
47
+ ## 🔧 Training Configuration
48
+
49
+ ### For HF Spaces (Limited Resources):
50
+ - **Model**: `distilgpt2` (small, fast)
51
+ - **Batch Size**: 1
52
+ - **Epochs**: 1
53
+ - **Max Length**: 128 tokens
54
+ - **Training Time**: ~5 minutes
55
+
56
+ ### For External Training (Full Resources):
57
+ - **Model**: `meta-llama/Llama-3.1-8B-Instruct`
58
+ - **Batch Size**: 4-8
59
+ - **Epochs**: 3
60
+ - **Max Length**: 2048 tokens
61
+ - **Training Time**: Hours
62
+
63
+ ## 📊 Training Data
64
+
65
+ Your space includes these training datasets:
66
+ - `data/lora_dataset_20250829_113330.jsonl` (33 samples)
67
+ - `data/lora_dataset_20250910_145055.jsonl`
68
+ - `data/textilindo_training_data.jsonl`
69
+ - `data/training_data.jsonl`
70
+
71
+ ## 🎯 Training Endpoints
72
+
73
+ ### Web Interface:
74
+ - **Training UI**: `/train`
75
+ - **Start Training**: `POST /train/start`
76
+ - **Check Status**: `GET /train/status`
77
+ - **View Data**: `GET /train/data`
78
+
79
+ ### API Usage:
80
+ ```bash
81
+ # Start training
82
+ curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/train/start"
83
+
84
+ # Check resources
85
+ curl "https://harismlnaslm-Textilindo-AI.hf.space/train/status"
86
+
87
+ # View training data
88
+ curl "https://harismlnaslm-Textilindo-AI.hf.space/train/data"
89
+ ```
90
+
91
+ ## ⚠️ Limitations of HF Spaces Training
92
+
93
+ ### Resource Constraints:
94
+ - **CPU Only**: No GPU acceleration
95
+ - **Memory**: Limited to ~4GB RAM
96
+ - **Time**: 5-minute timeout for training
97
+ - **Storage**: Limited disk space
98
+
99
+ ### Recommended Approach:
100
+ 1. **Quick Demo Training**: Use `quick_train.py` for testing
101
+ 2. **Full Training**: Use external resources (Google Colab, AWS, etc.)
102
+ 3. **Model Upload**: Upload pre-trained models to HF Hub
103
+
104
+ ## 🚀 External Training Options
105
+
106
+ ### Google Colab (Free GPU):
107
+ ```python
108
+ # Upload your training data
109
+ # Run: python scripts/train_textilindo_ai.py
110
+ # Download trained model
111
+ # Upload to HF Hub
112
+ ```
113
+
114
+ ### Local Training:
115
+ ```bash
116
+ # Setup environment
117
+ python scripts/setup_textilindo_training.py
118
+
119
+ # Download model
120
+ python scripts/download_model.py
121
+
122
+ # Run training
123
+ python scripts/train_textilindo_ai.py
124
+
125
+ # Test model
126
+ python scripts/test_textilindo_ai.py
127
+ ```
128
+
129
+ ### Cloud Training (AWS/GCP):
130
+ ```bash
131
+ # Use the monitoring script
132
+ python scripts/train_with_monitoring.py
133
+ ```
134
+
135
+ ## 📈 Training Progress Monitoring
136
+
137
+ ### On HF Spaces:
138
+ - Check the training log in the web interface
139
+ - Use `/train/status` endpoint for resource monitoring
140
+
141
+ ### External Training:
142
+ ```bash
143
+ # Use monitoring script
144
+ python scripts/train_with_monitoring.py
145
+
146
+ # Check logs
147
+ tail -f logs/training.log
148
+ ```
149
+
150
+ ## 🧪 Testing Trained Models
151
+
152
+ ### Quick Test:
153
+ ```bash
154
+ python quick_train.py # Includes testing
155
+ ```
156
+
157
+ ### Full Testing:
158
+ ```bash
159
+ python scripts/test_textilindo_ai.py
160
+ python scripts/test_model.py
161
+ ```
162
+
163
+ ### API Testing:
164
+ ```bash
165
+ # Test chat endpoint
166
+ curl -X POST "https://harismlnaslm-Textilindo-AI.hf.space/chat" \
167
+ -H "Content-Type: application/json" \
168
+ -d '{"message": "dimana lokasi textilindo?"}'
169
+ ```
170
+
171
+ ## 🔧 Troubleshooting
172
+
173
+ ### Common Issues:
174
+
175
+ 1. **"Out of Memory"**
176
+ - Use smaller models (distilgpt2)
177
+ - Reduce batch size
178
+ - Use external training
179
+
180
+ 2. **"Training Timeout"**
181
+ - HF Spaces has 5-minute limit
182
+ - Use external resources for full training
183
+
184
+ 3. **"Model Not Found"**
185
+ - Check if model is downloaded
186
+ - Use `python scripts/download_model.py`
187
+
188
+ 4. **"Data Not Found"**
189
+ - Verify data files exist in `data/` directory
190
+ - Check file permissions
191
+
192
+ ## 📚 Next Steps
193
+
194
+ 1. **Start with Quick Training**: Test the setup with `quick_train.py`
195
+ 2. **Monitor Resources**: Use `/train/status` to check available resources
196
+ 3. **External Training**: For full training, use external resources
197
+ 4. **Model Upload**: Upload trained models to Hugging Face Hub
198
+ 5. **Integration**: Use uploaded models in your space
199
+
200
+ ## 🎉 Success Indicators
201
+
202
+ - ✅ Training completes without errors
203
+ - ✅ Model saves to `./models/` directory
204
+ - ✅ Test responses are generated
205
+ - ✅ Chat interface works with trained model
206
+ - ✅ API endpoints respond correctly
207
+
208
+ ---
209
+
210
+ *Happy Training! 🚀*
app.py CHANGED
@@ -9,7 +9,7 @@ import json
9
  import logging
10
  from pathlib import Path
11
  from typing import Optional, Dict, Any
12
- from fastapi import FastAPI, HTTPException, Request
13
  from fastapi.responses import HTMLResponse, JSONResponse
14
  from fastapi.staticfiles import StaticFiles
15
  from fastapi.middleware.cors import CORSMiddleware
@@ -17,6 +17,7 @@ from pydantic import BaseModel
17
  import uvicorn
18
  from huggingface_hub import InferenceClient
19
  import requests
 
20
 
21
  # Setup logging
22
  logging.basicConfig(level=logging.INFO)
@@ -272,6 +273,284 @@ async def get_info():
272
  "client_initialized": bool(ai_assistant.client)
273
  }
274
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
275
  # Mount static files if they exist
276
  if Path("static").exists():
277
  app.mount("/static", StaticFiles(directory="static"), name="static")
 
9
  import logging
10
  from pathlib import Path
11
  from typing import Optional, Dict, Any
12
+ from fastapi import FastAPI, HTTPException, Request, BackgroundTasks
13
  from fastapi.responses import HTMLResponse, JSONResponse
14
  from fastapi.staticfiles import StaticFiles
15
  from fastapi.middleware.cors import CORSMiddleware
 
17
  import uvicorn
18
  from huggingface_hub import InferenceClient
19
  import requests
20
+ from datetime import datetime
21
 
22
  # Setup logging
23
  logging.basicConfig(level=logging.INFO)
 
273
  "client_initialized": bool(ai_assistant.client)
274
  }
275
 
276
+ # Import training API
277
+ from training_api import (
278
+ TrainingRequest, TrainingResponse, training_status,
279
+ train_model_async, load_training_config, load_training_data, check_gpu_availability
280
+ )
281
+
282
+ # Training API endpoints
283
+ @app.post("/api/train/start", response_model=TrainingResponse)
284
+ async def start_training_api(request: TrainingRequest, background_tasks: BackgroundTasks):
285
+ """Start training process via API"""
286
+ if training_status["is_training"]:
287
+ raise HTTPException(status_code=400, detail="Training already in progress")
288
+
289
+ # Validate inputs
290
+ if not Path(request.dataset_path).exists():
291
+ raise HTTPException(status_code=404, detail=f"Dataset not found: {request.dataset_path}")
292
+
293
+ if not Path(request.config_path).exists():
294
+ raise HTTPException(status_code=404, detail=f"Config not found: {request.config_path}")
295
+
296
+ # Start training in background
297
+ training_id = f"train_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
298
+
299
+ background_tasks.add_task(
300
+ train_model_async,
301
+ request.model_name,
302
+ request.dataset_path,
303
+ request.config_path,
304
+ request.max_samples,
305
+ request.epochs,
306
+ request.batch_size,
307
+ request.learning_rate
308
+ )
309
+
310
+ return TrainingResponse(
311
+ success=True,
312
+ message="Training started successfully",
313
+ training_id=training_id,
314
+ status="started"
315
+ )
316
+
317
+ @app.get("/api/train/status")
318
+ async def get_training_status_api():
319
+ """Get current training status"""
320
+ return training_status
321
+
322
+ @app.get("/api/train/data")
323
+ async def get_training_data_info_api():
324
+ """Get information about available training data"""
325
+ data_dir = Path("data")
326
+ if not data_dir.exists():
327
+ return {"files": [], "count": 0}
328
+
329
+ jsonl_files = list(data_dir.glob("*.jsonl"))
330
+ files_info = []
331
+
332
+ for file in jsonl_files:
333
+ try:
334
+ with open(file, 'r', encoding='utf-8') as f:
335
+ lines = f.readlines()
336
+ files_info.append({
337
+ "name": file.name,
338
+ "size": file.stat().st_size,
339
+ "lines": len(lines)
340
+ })
341
+ except Exception as e:
342
+ files_info.append({
343
+ "name": file.name,
344
+ "error": str(e)
345
+ })
346
+
347
+ return {
348
+ "files": files_info,
349
+ "count": len(jsonl_files)
350
+ }
351
+
352
+ @app.get("/api/train/gpu")
353
+ async def get_gpu_info_api():
354
+ """Get GPU information"""
355
+ try:
356
+ import torch
357
+ gpu_available = torch.cuda.is_available()
358
+ if gpu_available:
359
+ gpu_count = torch.cuda.device_count()
360
+ gpu_name = torch.cuda.get_device_name(0)
361
+ gpu_memory = torch.cuda.get_device_properties(0).total_memory / (1024**3)
362
+ return {
363
+ "available": True,
364
+ "count": gpu_count,
365
+ "name": gpu_name,
366
+ "memory_gb": round(gpu_memory, 2)
367
+ }
368
+ else:
369
+ return {"available": False}
370
+ except Exception as e:
371
+ return {"error": str(e)}
372
+
373
+ @app.post("/api/train/test")
374
+ async def test_trained_model_api():
375
+ """Test the trained model"""
376
+ model_path = "./models/textilindo-trained"
377
+ if not Path(model_path).exists():
378
+ return {"error": "No trained model found"}
379
+
380
+ try:
381
+ from transformers import AutoTokenizer, AutoModelForCausalLM
382
+ import torch
383
+
384
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
385
+ model = AutoModelForCausalLM.from_pretrained(model_path)
386
+
387
+ # Test prompt
388
+ test_prompt = "Question: dimana lokasi textilindo? Answer:"
389
+ inputs = tokenizer(test_prompt, return_tensors="pt")
390
+
391
+ with torch.no_grad():
392
+ outputs = model.generate(
393
+ **inputs,
394
+ max_length=inputs.input_ids.shape[1] + 30,
395
+ temperature=0.7,
396
+ do_sample=True,
397
+ pad_token_id=tokenizer.eos_token_id
398
+ )
399
+
400
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
401
+
402
+ return {
403
+ "success": True,
404
+ "test_prompt": test_prompt,
405
+ "response": response,
406
+ "model_path": model_path
407
+ }
408
+
409
+ except Exception as e:
410
+ return {"error": str(e)}
411
+
412
+ # Legacy training endpoints (for backward compatibility)
413
+ @app.get("/train")
414
+ async def training_interface():
415
+ """Training interface"""
416
+ try:
417
+ with open("templates/training.html", "r", encoding="utf-8") as f:
418
+ return HTMLResponse(content=f.read())
419
+ except FileNotFoundError:
420
+ return HTMLResponse(content="""
421
+ <!DOCTYPE html>
422
+ <html>
423
+ <head>
424
+ <title>Textilindo AI Training</title>
425
+ <meta charset="utf-8">
426
+ <style>
427
+ body { font-family: Arial, sans-serif; max-width: 800px; margin: 0 auto; padding: 20px; }
428
+ .container { background: #f5f5f5; padding: 20px; border-radius: 10px; margin: 20px 0; }
429
+ button { background: #2196f3; color: white; border: none; padding: 10px 20px; border-radius: 5px; cursor: pointer; }
430
+ button:hover { background: #1976d2; }
431
+ .log { background: #000; color: #0f0; padding: 10px; border-radius: 5px; font-family: monospace; height: 300px; overflow-y: auto; }
432
+ </style>
433
+ </head>
434
+ <body>
435
+ <h1>🤖 Textilindo AI Training Interface</h1>
436
+
437
+ <div class="container">
438
+ <h2>Training Options</h2>
439
+ <p>Choose your training method:</p>
440
+
441
+ <button onclick="startLightweightTraining()">Start Lightweight Training</button>
442
+ <button onclick="checkResources()">Check Resources</button>
443
+ <button onclick="viewData()">View Training Data</button>
444
+ </div>
445
+
446
+ <div class="container">
447
+ <h2>Training Log</h2>
448
+ <div id="log" class="log">Ready to start training...</div>
449
+ </div>
450
+
451
+ <script>
452
+ function addLog(message) {
453
+ const log = document.getElementById('log');
454
+ const timestamp = new Date().toLocaleTimeString();
455
+ log.innerHTML += `[${timestamp}] ${message}\\n`;
456
+ log.scrollTop = log.scrollHeight;
457
+ }
458
+
459
+ async function startLightweightTraining() {
460
+ addLog('Starting lightweight training...');
461
+ try {
462
+ const response = await fetch('/train/start', {
463
+ method: 'POST',
464
+ headers: { 'Content-Type': 'application/json' }
465
+ });
466
+ const result = await response.json();
467
+ addLog(`Training result: ${result.message}`);
468
+ } catch (error) {
469
+ addLog(`Error: ${error.message}`);
470
+ }
471
+ }
472
+
473
+ async function checkResources() {
474
+ addLog('Checking resources...');
475
+ try {
476
+ const response = await fetch('/train/status');
477
+ const result = await response.json();
478
+ addLog(`Resources: ${JSON.stringify(result, null, 2)}`);
479
+ } catch (error) {
480
+ addLog(`Error: ${error.message}`);
481
+ }
482
+ }
483
+
484
+ async function viewData() {
485
+ addLog('Loading training data...');
486
+ try {
487
+ const response = await fetch('/train/data');
488
+ const result = await response.json();
489
+ addLog(`Data files: ${result.files.join(', ')}`);
490
+ } catch (error) {
491
+ addLog(`Error: ${error.message}`);
492
+ }
493
+ }
494
+ </script>
495
+ </body>
496
+ </html>
497
+ """)
498
+
499
+ @app.post("/train/start")
500
+ async def start_training():
501
+ """Start lightweight training"""
502
+ try:
503
+ # Import training script
504
+ import subprocess
505
+ import sys
506
+
507
+ # Run the training script
508
+ result = subprocess.run([
509
+ sys.executable, "train_on_space.py"
510
+ ], capture_output=True, text=True, timeout=300) # 5 minute timeout
511
+
512
+ if result.returncode == 0:
513
+ return {"message": "Training completed successfully!", "output": result.stdout}
514
+ else:
515
+ return {"message": "Training failed", "error": result.stderr}
516
+
517
+ except subprocess.TimeoutExpired:
518
+ return {"message": "Training timed out (5 minutes limit)"}
519
+ except Exception as e:
520
+ return {"message": f"Training error: {str(e)}"}
521
+
522
+ @app.get("/train/status")
523
+ async def training_status():
524
+ """Get training status and resources"""
525
+ try:
526
+ import psutil
527
+
528
+ return {
529
+ "status": "ready",
530
+ "cpu_count": psutil.cpu_count(),
531
+ "memory_total_gb": round(psutil.virtual_memory().total / (1024**3), 2),
532
+ "memory_available_gb": round(psutil.virtual_memory().available / (1024**3), 2),
533
+ "disk_free_gb": round(psutil.disk_usage('.').free / (1024**3), 2)
534
+ }
535
+ except Exception as e:
536
+ return {"status": "error", "message": str(e)}
537
+
538
+ @app.get("/train/data")
539
+ async def training_data():
540
+ """Get training data information"""
541
+ try:
542
+ data_dir = Path("data")
543
+ if data_dir.exists():
544
+ jsonl_files = list(data_dir.glob("*.jsonl"))
545
+ return {
546
+ "files": [f.name for f in jsonl_files],
547
+ "count": len(jsonl_files)
548
+ }
549
+ else:
550
+ return {"files": [], "count": 0}
551
+ except Exception as e:
552
+ return {"error": str(e)}
553
+
554
  # Mount static files if they exist
555
  if Path("static").exists():
556
  app.mount("/static", StaticFiles(directory="static"), name="static")
quick_train.py ADDED
@@ -0,0 +1,181 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Quick training script for Hugging Face Spaces
4
+ Optimized for CPU-only training with limited resources
5
+ """
6
+
7
+ import os
8
+ import json
9
+ import logging
10
+ from pathlib import Path
11
+ from datetime import datetime
12
+
13
+ # Setup logging
14
+ logging.basicConfig(level=logging.INFO)
15
+ logger = logging.getLogger(__name__)
16
+
17
+ def quick_training():
18
+ """Quick training suitable for HF Spaces"""
19
+ print("🚀 Starting Quick Training for Hugging Face Spaces")
20
+ print("=" * 60)
21
+
22
+ try:
23
+ # Import required libraries
24
+ from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
25
+ from datasets import Dataset
26
+ import torch
27
+
28
+ print("✅ Successfully imported training libraries")
29
+
30
+ # Use a very small model for HF Spaces
31
+ model_name = "distilgpt2" # Small, fast model
32
+ print(f"📥 Loading model: {model_name}")
33
+
34
+ # Load tokenizer and model
35
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
36
+ if tokenizer.pad_token is None:
37
+ tokenizer.pad_token = tokenizer.eos_token
38
+
39
+ model = AutoModelForCausalLM.from_pretrained(model_name)
40
+ print("✅ Model loaded successfully")
41
+
42
+ # Load training data (limit to small amount for HF Spaces)
43
+ data_file = Path("data/lora_dataset_20250829_113330.jsonl")
44
+ if not data_file.exists():
45
+ print("❌ Training data not found")
46
+ return False
47
+
48
+ # Load and prepare data
49
+ training_data = []
50
+ with open(data_file, 'r', encoding='utf-8') as f:
51
+ for i, line in enumerate(f):
52
+ if i >= 5: # Limit to 5 samples for quick training
53
+ break
54
+ if line.strip():
55
+ data = json.loads(line)
56
+ # Create simple training text
57
+ text = f"Question: {data.get('instruction', '')} Answer: {data.get('output', '')}"
58
+ training_data.append({"text": text})
59
+
60
+ print(f"✅ Loaded {len(training_data)} training samples")
61
+
62
+ if not training_data:
63
+ print("❌ No training data found")
64
+ return False
65
+
66
+ # Convert to dataset
67
+ dataset = Dataset.from_list(training_data)
68
+
69
+ def tokenize_function(examples):
70
+ return tokenizer(
71
+ examples["text"],
72
+ truncation=True,
73
+ padding=True,
74
+ max_length=128 # Short sequences for quick training
75
+ )
76
+
77
+ tokenized_dataset = dataset.map(tokenize_function, batched=True)
78
+
79
+ # Training arguments optimized for HF Spaces
80
+ training_args = TrainingArguments(
81
+ output_dir="./models/quick-trained",
82
+ num_train_epochs=1, # Single epoch
83
+ per_device_train_batch_size=1, # Small batch
84
+ gradient_accumulation_steps=2,
85
+ learning_rate=5e-5,
86
+ warmup_steps=2,
87
+ save_steps=10,
88
+ logging_steps=1,
89
+ save_total_limit=1,
90
+ prediction_loss_only=True,
91
+ remove_unused_columns=False,
92
+ fp16=False, # Disable fp16 for CPU
93
+ dataloader_pin_memory=False,
94
+ report_to=None, # Disable wandb/tensorboard
95
+ )
96
+
97
+ # Create trainer
98
+ trainer = Trainer(
99
+ model=model,
100
+ args=training_args,
101
+ train_dataset=tokenized_dataset,
102
+ tokenizer=tokenizer,
103
+ )
104
+
105
+ print("🚀 Starting training...")
106
+ print("⚠️ This is a quick demo training with limited data")
107
+
108
+ # Train
109
+ trainer.train()
110
+
111
+ # Save the model
112
+ model.save_pretrained("./models/quick-trained")
113
+ tokenizer.save_pretrained("./models/quick-trained")
114
+
115
+ print("✅ Quick training completed successfully!")
116
+ print("📁 Model saved to: ./models/quick-trained")
117
+
118
+ # Test the model
119
+ print("\n🧪 Testing the trained model...")
120
+ test_prompt = "Question: dimana lokasi textilindo? Answer:"
121
+ inputs = tokenizer(test_prompt, return_tensors="pt")
122
+
123
+ with torch.no_grad():
124
+ outputs = model.generate(
125
+ **inputs,
126
+ max_length=inputs.input_ids.shape[1] + 20,
127
+ temperature=0.7,
128
+ do_sample=True,
129
+ pad_token_id=tokenizer.eos_token_id
130
+ )
131
+
132
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
133
+ print(f"📝 Test response: {response}")
134
+
135
+ return True
136
+
137
+ except ImportError as e:
138
+ print(f"❌ Missing required library: {e}")
139
+ print("💡 Install with: pip install transformers datasets torch")
140
+ return False
141
+ except Exception as e:
142
+ print(f"❌ Training failed: {e}")
143
+ return False
144
+
145
+ def main():
146
+ """Main function"""
147
+ print("🤖 Textilindo AI - Quick Training on Hugging Face Spaces")
148
+ print("=" * 70)
149
+
150
+ # Check if we're on HF Spaces
151
+ if os.getenv('SPACE_ID'):
152
+ print("✅ Running on Hugging Face Spaces")
153
+ else:
154
+ print("⚠️ Not running on Hugging Face Spaces")
155
+
156
+ # Check available data
157
+ data_dir = Path("data")
158
+ if data_dir.exists():
159
+ jsonl_files = list(data_dir.glob("*.jsonl"))
160
+ print(f"📊 Found {len(jsonl_files)} training data files")
161
+ for file in jsonl_files:
162
+ print(f" - {file.name}")
163
+ else:
164
+ print("❌ No data directory found")
165
+ return 1
166
+
167
+ # Run quick training
168
+ if quick_training():
169
+ print("\n🎉 Quick training completed successfully!")
170
+ print("📋 Next steps:")
171
+ print("1. Check the trained model in ./models/quick-trained/")
172
+ print("2. Test the model with your chat interface")
173
+ print("3. For full training, use external resources")
174
+ return 0
175
+ else:
176
+ print("\n❌ Quick training failed")
177
+ return 1
178
+
179
+ if __name__ == "__main__":
180
+ import sys
181
+ sys.exit(main())
training_api.py ADDED
@@ -0,0 +1,438 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Textilindo AI Training API
4
+ Pure API-based training system for Hugging Face Spaces
5
+ Uses free GPU tier and your training data/configs
6
+ """
7
+
8
+ import os
9
+ import json
10
+ import yaml
11
+ import logging
12
+ import torch
13
+ from pathlib import Path
14
+ from datetime import datetime
15
+ from typing import Dict, Any, Optional
16
+ from fastapi import FastAPI, HTTPException, BackgroundTasks
17
+ from pydantic import BaseModel
18
+ import uvicorn
19
+
20
+ # Setup logging
21
+ logging.basicConfig(level=logging.INFO)
22
+ logger = logging.getLogger(__name__)
23
+
24
+ # Training API
25
+ training_app = FastAPI(title="Textilindo AI Training API")
26
+
27
+ # Training status storage
28
+ training_status = {
29
+ "is_training": False,
30
+ "progress": 0,
31
+ "status": "idle",
32
+ "current_step": 0,
33
+ "total_steps": 0,
34
+ "loss": 0.0,
35
+ "start_time": None,
36
+ "end_time": None,
37
+ "error": None
38
+ }
39
+
40
+ class TrainingRequest(BaseModel):
41
+ model_name: str = "distilgpt2" # Start with small model
42
+ dataset_path: str = "data/lora_dataset_20250829_113330.jsonl"
43
+ config_path: str = "configs/training_config.yaml"
44
+ max_samples: int = 10 # Limit for free tier
45
+ epochs: int = 1
46
+ batch_size: int = 1
47
+ learning_rate: float = 5e-5
48
+
49
+ class TrainingResponse(BaseModel):
50
+ success: bool
51
+ message: str
52
+ training_id: str
53
+ status: str
54
+
55
+ def load_training_config(config_path: str) -> Dict[str, Any]:
56
+ """Load training configuration"""
57
+ try:
58
+ with open(config_path, 'r') as f:
59
+ config = yaml.safe_load(f)
60
+ return config
61
+ except Exception as e:
62
+ logger.error(f"Error loading config: {e}")
63
+ return {}
64
+
65
+ def load_training_data(dataset_path: str, max_samples: int = 10) -> list:
66
+ """Load training data from JSONL file"""
67
+ data = []
68
+ try:
69
+ with open(dataset_path, 'r', encoding='utf-8') as f:
70
+ for i, line in enumerate(f):
71
+ if i >= max_samples:
72
+ break
73
+ if line.strip():
74
+ item = json.loads(line)
75
+ # Create training text
76
+ instruction = item.get('instruction', '')
77
+ output = item.get('output', '')
78
+ text = f"Question: {instruction} Answer: {output}"
79
+ data.append({"text": text})
80
+ logger.info(f"Loaded {len(data)} training samples")
81
+ return data
82
+ except Exception as e:
83
+ logger.error(f"Error loading data: {e}")
84
+ return []
85
+
86
+ def check_gpu_availability() -> bool:
87
+ """Check if GPU is available"""
88
+ try:
89
+ if torch.cuda.is_available():
90
+ gpu_count = torch.cuda.device_count()
91
+ gpu_name = torch.cuda.get_device_name(0)
92
+ logger.info(f"GPU available: {gpu_name} (Count: {gpu_count})")
93
+ return True
94
+ else:
95
+ logger.info("No GPU available, using CPU")
96
+ return False
97
+ except Exception as e:
98
+ logger.error(f"Error checking GPU: {e}")
99
+ return False
100
+
101
+ def train_model_async(
102
+ model_name: str,
103
+ dataset_path: str,
104
+ config_path: str,
105
+ max_samples: int,
106
+ epochs: int,
107
+ batch_size: int,
108
+ learning_rate: float
109
+ ):
110
+ """Async training function"""
111
+ global training_status
112
+
113
+ try:
114
+ training_status.update({
115
+ "is_training": True,
116
+ "status": "starting",
117
+ "progress": 0,
118
+ "start_time": datetime.now().isoformat(),
119
+ "error": None
120
+ })
121
+
122
+ logger.info("🚀 Starting training...")
123
+
124
+ # Import training libraries
125
+ from transformers import (
126
+ AutoTokenizer,
127
+ AutoModelForCausalLM,
128
+ TrainingArguments,
129
+ Trainer,
130
+ DataCollatorForLanguageModeling
131
+ )
132
+ from datasets import Dataset
133
+
134
+ # Check GPU
135
+ gpu_available = check_gpu_availability()
136
+
137
+ # Load model and tokenizer
138
+ logger.info(f"📥 Loading model: {model_name}")
139
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
140
+ if tokenizer.pad_token is None:
141
+ tokenizer.pad_token = tokenizer.eos_token
142
+
143
+ # Load model with GPU if available
144
+ if gpu_available:
145
+ model = AutoModelForCausalLM.from_pretrained(
146
+ model_name,
147
+ torch_dtype=torch.float16,
148
+ device_map="auto"
149
+ )
150
+ else:
151
+ model = AutoModelForCausalLM.from_pretrained(model_name)
152
+
153
+ logger.info("✅ Model loaded successfully")
154
+
155
+ # Load training data
156
+ training_data = load_training_data(dataset_path, max_samples)
157
+ if not training_data:
158
+ raise Exception("No training data loaded")
159
+
160
+ # Convert to dataset
161
+ dataset = Dataset.from_list(training_data)
162
+
163
+ def tokenize_function(examples):
164
+ return tokenizer(
165
+ examples["text"],
166
+ truncation=True,
167
+ padding=True,
168
+ max_length=256,
169
+ return_tensors="pt"
170
+ )
171
+
172
+ tokenized_dataset = dataset.map(tokenize_function, batched=True)
173
+
174
+ # Training arguments
175
+ training_args = TrainingArguments(
176
+ output_dir="./models/textilindo-trained",
177
+ num_train_epochs=epochs,
178
+ per_device_train_batch_size=batch_size,
179
+ gradient_accumulation_steps=2,
180
+ learning_rate=learning_rate,
181
+ warmup_steps=5,
182
+ save_steps=10,
183
+ logging_steps=1,
184
+ save_total_limit=1,
185
+ prediction_loss_only=True,
186
+ remove_unused_columns=False,
187
+ fp16=gpu_available, # Use fp16 only if GPU available
188
+ dataloader_pin_memory=gpu_available,
189
+ report_to=None,
190
+ )
191
+
192
+ # Data collator
193
+ data_collator = DataCollatorForLanguageModeling(
194
+ tokenizer=tokenizer,
195
+ mlm=False,
196
+ )
197
+
198
+ # Create trainer
199
+ trainer = Trainer(
200
+ model=model,
201
+ args=training_args,
202
+ train_dataset=tokenized_dataset,
203
+ data_collator=data_collator,
204
+ tokenizer=tokenizer,
205
+ )
206
+
207
+ # Custom callback for progress tracking
208
+ class ProgressCallback:
209
+ def __init__(self):
210
+ self.step = 0
211
+ self.total_steps = len(tokenized_dataset) * epochs
212
+
213
+ def on_log(self, args, state, control, logs=None, **kwargs):
214
+ global training_status
215
+ if logs:
216
+ training_status.update({
217
+ "current_step": state.global_step,
218
+ "total_steps": self.total_steps,
219
+ "progress": min(100, (state.global_step / self.total_steps) * 100),
220
+ "loss": logs.get('loss', 0.0),
221
+ "status": "training"
222
+ })
223
+
224
+ # Add callback
225
+ trainer.add_callback(ProgressCallback())
226
+
227
+ # Start training
228
+ training_status["status"] = "training"
229
+ trainer.train()
230
+
231
+ # Save model
232
+ model.save_pretrained("./models/textilindo-trained")
233
+ tokenizer.save_pretrained("./models/textilindo-trained")
234
+
235
+ # Update status
236
+ training_status.update({
237
+ "is_training": False,
238
+ "status": "completed",
239
+ "progress": 100,
240
+ "end_time": datetime.now().isoformat()
241
+ })
242
+
243
+ logger.info("✅ Training completed successfully!")
244
+
245
+ except Exception as e:
246
+ logger.error(f"Training failed: {e}")
247
+ training_status.update({
248
+ "is_training": False,
249
+ "status": "failed",
250
+ "error": str(e),
251
+ "end_time": datetime.now().isoformat()
252
+ })
253
+
254
+ # API Endpoints
255
+
256
+ @training_app.post("/train/start", response_model=TrainingResponse)
257
+ async def start_training(request: TrainingRequest, background_tasks: BackgroundTasks):
258
+ """Start training process"""
259
+ global training_status
260
+
261
+ if training_status["is_training"]:
262
+ raise HTTPException(status_code=400, detail="Training already in progress")
263
+
264
+ # Validate inputs
265
+ if not Path(request.dataset_path).exists():
266
+ raise HTTPException(status_code=404, detail=f"Dataset not found: {request.dataset_path}")
267
+
268
+ if not Path(request.config_path).exists():
269
+ raise HTTPException(status_code=404, detail=f"Config not found: {request.config_path}")
270
+
271
+ # Start training in background
272
+ training_id = f"train_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
273
+
274
+ background_tasks.add_task(
275
+ train_model_async,
276
+ request.model_name,
277
+ request.dataset_path,
278
+ request.config_path,
279
+ request.max_samples,
280
+ request.epochs,
281
+ request.batch_size,
282
+ request.learning_rate
283
+ )
284
+
285
+ return TrainingResponse(
286
+ success=True,
287
+ message="Training started successfully",
288
+ training_id=training_id,
289
+ status="started"
290
+ )
291
+
292
+ @training_app.get("/train/status")
293
+ async def get_training_status():
294
+ """Get current training status"""
295
+ return training_status
296
+
297
+ @training_app.get("/train/data")
298
+ async def get_training_data_info():
299
+ """Get information about available training data"""
300
+ data_dir = Path("data")
301
+ if not data_dir.exists():
302
+ return {"files": [], "count": 0}
303
+
304
+ jsonl_files = list(data_dir.glob("*.jsonl"))
305
+ files_info = []
306
+
307
+ for file in jsonl_files:
308
+ try:
309
+ with open(file, 'r', encoding='utf-8') as f:
310
+ lines = f.readlines()
311
+ files_info.append({
312
+ "name": file.name,
313
+ "size": file.stat().st_size,
314
+ "lines": len(lines)
315
+ })
316
+ except Exception as e:
317
+ files_info.append({
318
+ "name": file.name,
319
+ "error": str(e)
320
+ })
321
+
322
+ return {
323
+ "files": files_info,
324
+ "count": len(jsonl_files)
325
+ }
326
+
327
+ @training_app.get("/train/config")
328
+ async def get_training_config():
329
+ """Get current training configuration"""
330
+ config_path = "configs/training_config.yaml"
331
+ if not Path(config_path).exists():
332
+ return {"error": "Config file not found"}
333
+
334
+ try:
335
+ config = load_training_config(config_path)
336
+ return config
337
+ except Exception as e:
338
+ return {"error": str(e)}
339
+
340
+ @training_app.get("/train/models")
341
+ async def get_available_models():
342
+ """Get list of available models"""
343
+ return {
344
+ "models": [
345
+ {
346
+ "name": "distilgpt2",
347
+ "size": "82M",
348
+ "description": "Small, fast model for quick training"
349
+ },
350
+ {
351
+ "name": "gpt2",
352
+ "size": "124M",
353
+ "description": "Original GPT-2 model"
354
+ },
355
+ {
356
+ "name": "microsoft/DialoGPT-small",
357
+ "size": "117M",
358
+ "description": "Conversational model"
359
+ }
360
+ ]
361
+ }
362
+
363
+ @training_app.get("/train/gpu")
364
+ async def get_gpu_info():
365
+ """Get GPU information"""
366
+ try:
367
+ gpu_available = torch.cuda.is_available()
368
+ if gpu_available:
369
+ gpu_count = torch.cuda.device_count()
370
+ gpu_name = torch.cuda.get_device_name(0)
371
+ gpu_memory = torch.cuda.get_device_properties(0).total_memory / (1024**3)
372
+ return {
373
+ "available": True,
374
+ "count": gpu_count,
375
+ "name": gpu_name,
376
+ "memory_gb": round(gpu_memory, 2)
377
+ }
378
+ else:
379
+ return {"available": False}
380
+ except Exception as e:
381
+ return {"error": str(e)}
382
+
383
+ @training_app.post("/train/stop")
384
+ async def stop_training():
385
+ """Stop current training"""
386
+ global training_status
387
+
388
+ if not training_status["is_training"]:
389
+ return {"message": "No training in progress"}
390
+
391
+ training_status.update({
392
+ "is_training": False,
393
+ "status": "stopped",
394
+ "end_time": datetime.now().isoformat()
395
+ })
396
+
397
+ return {"message": "Training stopped"}
398
+
399
+ @training_app.get("/train/test")
400
+ async def test_trained_model():
401
+ """Test the trained model"""
402
+ model_path = "./models/textilindo-trained"
403
+ if not Path(model_path).exists():
404
+ return {"error": "No trained model found"}
405
+
406
+ try:
407
+ from transformers import AutoTokenizer, AutoModelForCausalLM
408
+
409
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
410
+ model = AutoModelForCausalLM.from_pretrained(model_path)
411
+
412
+ # Test prompt
413
+ test_prompt = "Question: dimana lokasi textilindo? Answer:"
414
+ inputs = tokenizer(test_prompt, return_tensors="pt")
415
+
416
+ with torch.no_grad():
417
+ outputs = model.generate(
418
+ **inputs,
419
+ max_length=inputs.input_ids.shape[1] + 30,
420
+ temperature=0.7,
421
+ do_sample=True,
422
+ pad_token_id=tokenizer.eos_token_id
423
+ )
424
+
425
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
426
+
427
+ return {
428
+ "success": True,
429
+ "test_prompt": test_prompt,
430
+ "response": response,
431
+ "model_path": model_path
432
+ }
433
+
434
+ except Exception as e:
435
+ return {"error": str(e)}
436
+
437
+ if __name__ == "__main__":
438
+ uvicorn.run(training_app, host="0.0.0.0", port=7861)