to set up and configure a self-hosted large language model (Llama 3.1) on our Linux server infrastructure for automated report generation.
Primary Objective:
Deploy and configure Llama 3.1 (or equivalent) 8B on our hosted Linux server (CPU-only) and create an API service that our SafetyNet Platform can call for AI-powered report generation.
Specific Deliverables:
Server Environment Setup
Configure Linux (Ubuntu 22.04) server environment
Install Python 3.11, dependencies, and required libraries
Set up virtual environment and security configurations
AI Model Installation & Configuration
Download and install Llama 3.1 8B Instruct model
Optimize model configuration for CPU-only inference
Implement quantization if needed for performance
Test model functionality and response quality
API Service Development (Nice to have)
Create REST API service (Flask/FastAPI) for report generation
Implement secure endpoints for our SafetyNet Platform to call
Add error handling, logging, and health check endpoints
Configure service to auto-start on server reboot (systemd)
Security & Performance
Configure firewall rules (allow only our application server)
Implement authentication/API key system
Optimize for 30-60 second response times
Set up monitoring and logging
Documentation & Training
Comprehensive setup documentation
API usage guide with examples
Troubleshooting guide
2-hour knowledge transfer session with our development team
Testing & Validation
Generate 10+ test reports with sample data
Validate output quality and format
Performance testing under load
Integration testing with our platform (we'll provide API endpoints)
Technical Requirements
Must Have:
3+ years experience with Python and machine learning frameworks (PyTorch, Transformers)
Experience deploying and running large language models (Llama, GPT, Mistral, etc.)
Strong Linux system administration skills (Ubuntu/Debian)
Experience with API development (Flask, FastAPI, or similar)
Understanding of CPU-based ML inference and optimization
Experience with Hugging Face model hub
Knowledge of systemd service configuration
Security best practices for production systems
Nice to Have:
Experience with model quantization and optimization (bitsandbytes, ONNX)
DevOps experience (Docker, monitoring tools)
Previous work with government or healthcare systems (HIPAA/FERPA compliance)
Experience with justice system or social services applications
Apply Now
Apply Now