4. Connect Hugging Face with Fast API & Kubernetes Deployment and Autoscaling

Step 4: Develop and Deploy the Translation Service with FastAPI

In this step, we’ll build a translation microservice using FastAPI, containerize it using Docker, and deploy it on Kubernetes for scalability. The translation service will use a Hugging Face model (like MarianMT) to perform real-time translations.

Step 4.1: Develop the Translation Service with FastAPI

Set Up the Environment:
- Install FastAPI and transformers libraries to work with the Hugging Face MarianMT model. You may also need torch for model inference.
```
pip install fastapi transformers torch uvicorn
```

Implement the Translation Service in Python:

The following FastAPI service uses MarianMT from Hugging Face for translation.

Service Code:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from transformers import MarianMTModel, MarianTokenizer

# Initialize the FastAPI app
app = FastAPI()

# Load the MarianMT model and tokenizer
model_name = 'Helsinki-NLP/opus-mt-es-en'  # Spanish to English model, adjust as needed
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

# Define a request model for structured API requests
class TranslationRequest(BaseModel):
    text: str
    source_language: str
    target_language: str

# Define the translation endpoint
@app.post("/translate")
def translate(request: TranslationRequest):
    # Prepare input for the model
    inputs = tokenizer(request.text, return_tensors="pt", truncation=True)
    try:
        # Perform translation
        translated_tokens = model.generate(**inputs)
        translated_text = tokenizer.decode(translated_tokens[0], skip_special_tokens=True)
        return {"translated_text": translated_text}
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"Translation error: {str(e)}")

Explanation:
- Model Loading: The MarianMTModel and MarianTokenizer are loaded from Hugging Face, specifically for Spanish-to-English translation in this example. Change model_name to target other languages.
- Translation Endpoint: The /translate endpoint accepts a POST request with the fields text, source_language, and target_language. It tokenizes the input text, performs translation, and returns the translated text.
- Error Handling: If any error occurs, an HTTP 500 response is returned.

Test the Service Locally:

Run the FastAPI app to test locally:

uvicorn translation_service:app --host 0.0.0.0 --port 8000

Send a POST request to test:

curl -X POST "<http://localhost:8000/translate>" -H "Content-Type: application/json" -d '{"text": "Hola, ¿cómo estás?", "source_language": "es", "target_language": "en"}'

This should return the translated text: {"translated_text": "Hello, how are you?"}.

Step 4.2: Containerize the Translation Service with Docker

Create a Dockerfile:

The Dockerfile describes how to build the container for the FastAPI service.

Dockerfile:

# Use a FastAPI-compatible base image
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.8

# Set the working directory
WORKDIR /app

# Copy and install requirements
COPY ./requirements.txt /app/requirements.txt
RUN pip install -r /app/requirements.txt

# Copy the application code
COPY . /app

Create requirements.txt for Python Dependencies:
- requirements.txt:
```
fastapi
transformers
torch
uvicorn
```
Build and Test the Docker Image Locally:
- Build the Docker image:
```
docker build -t translation_service
```
- Run the container to ensure it works as expected:
```
docker run -p 8000:8000 translation_service
```
- Access the translation service at http://localhost:8000/translate.
Push the Docker Image to a Container Registry:
- If you are using Docker Hub, tag and push the image:
```
docker tag translation_service your_dockerhub_username/translation_service:latest
docker push your_dockerhub_username/translation_service:latest
```
- Replace your_dockerhub_username with your actual Docker Hub username.

Step 4.3: Deploy the Translation Service on Kubernetes

Set Up Kubernetes Deployment and Service Files:

Kubernetes uses YAML files to define resources. Create two YAML files: one for the deployment and one for the service.

Deployment YAML (deployment.yaml):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: translation-service
spec:
  replicas: 3  # Number of pods
  selector:
    matchLabels:
      app: translation-service
  template:
    metadata:
      labels:
        app: translation-service
    spec:
      containers:
      - name: translation-service
        image: your_dockerhub_username/translation_service:latest
        ports:
        - containerPort: 8000
        resources:
          limits:
            memory: "512Mi"
            cpu: "500m"

Service YAML (service.yaml):

apiVersion: v1
kind: Service
metadata:
  name: translation-service
spec:
  type: LoadBalancer
  selector:
    app: translation-service
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8000

Explanation:
- Deployment YAML: Defines 3 replicas of the translation service for high availability. The container resources are limited to 512Mi memory and 500m CPU.
- Service YAML: Exposes the translation service externally using a LoadBalancer, allowing external applications to access the service.

Apply the YAML Files to Deploy on Kubernetes:
- Run the following commands to deploy on your Kubernetes cluster:
```
kubectl apply -f deployment.yaml
kubectl apply -f service.yam
```
Verify the Deployment and Service:
- Check if the pods are running:
```
kubectl get pods
```
- Check the external IP for the translation service:
```
kubectl get service translation-service
```
- Once you have the external IP, you can access the translation service via http://<EXTERNAL_IP>/translate.