Recipe Recommendation Engine: Deploying K-means Clustering in Production
Overview
This project is a recipe recommendation system that leverages machine learning to suggest recipes based on recipe complexity and preparation time. Using K-means clustering for similarity matching, the application provides a practical demonstration of deploying ML models in production with modern MLOps practices.
Architecture
Data Architecture
- Data Pipeline: Extract-Transform-Load (ETL) system for recipe datasets with preprocessing filters to remove recipes over 60 minutes or with more than 20 ingredients.
- Feature Engineering: Calculation of complexity scores (steps * ingredients) for ML processing.
- Model Storage: Serialized model artifacts (
recipe_recommender.joblib
andscaler.joblib
) with versioning.
Application Architecture
- ML Service: Python backend with scikit-learn for clustering
- Frontend: Streamlit interface for user interactions
Technologies Used
- Backend:
- Python 3.9 for core functionality
- pandas for data manipulation
- scikit-learn for K-means implementation
- Frontend:
- Streamlit for interactive UI
- Bootstrap components for styling
- DevOps:
- Docker for containerization
- GitHub Actions for CI/CD pipelines
- Railway for cloud hosting
Features
- Ingredient-Based Search:
- Match available ingredients to possible recipes
- Rank recipes by ingredient coverage
- Handle substitutions and alternative ingredients
- Recommendation Engine:
- Complexity-based filtering
- Preparation time-based recommendations
- Top-rated recipes filtered based on rating and interactions
- User Experience:
- Intuitive ingredient selection interface
- Recipe cards with images and instructions
- Dietary restriction filtering
- System Features:
- Performance monitoring
- A/B testing infrastructure for model improvements
Development Process
Motivation and Evolution
This project began as an exploration of practical applications for clustering algorithms:
- Initial prototype used classification-based methods like k-NN before evolving to K-means clustering for better scalability.
- Added production-ready features like caching and monitoring.
Architecture Decisions
- K-means over Neural Models: Chose K-means for interpretability and deployment simplicity
- Streamlit over React: Selected Streamlit for rapid ML application development
- Railway Deployment: Utilized for simplified containerized deployment
Workflow
- Data collection and cleaning from public recipe datasets
- Feature engineering to convert text ingredients to numerical vectors, filtering only recipes with a complexity score ≤100 and ratings ≥4.
- K-means model training and optimal k determination using elbow and silhouette analysis.
- Streamlit frontend development with responsive design
- Containerization and deployment pipeline setup
Key Advantages
- Lightweight model that can run in resource-constrained environments
- Easily explainable recommendations based on recipe complexity and preparation time
- Simple deployment and scaling through containerization
- Interactive UI that requires no ML knowledge from end-users
Implementation Details
Vectorization Approach
The system transforms recipe details into numerical vectors, including:
- Recipe complexity score.
- Normalized preparation time.
Clustering Implementation
K-means clustering is implemented to find similar recipes:
from sklearn.cluster import KMeans
def train_kmeans(vectors, n_clusters=20):
kmeans = KMeans(n_clusters=n_clusters, random_state=42)
kmeans.fit(vectors)
return kmeans
def get_recommendations(user_ingredients, kmeans_model, recipe_vectors, recipes):
# Vectorize user ingredients
user_vector = vectorize_ingredients(user_ingredients, all_ingredients)
# Find closest cluster
cluster = kmeans_model.predict([user_vector])[0]
# Get recipes from same cluster
cluster_recipes = []
for i, label in enumerate(kmeans_model.labels_):
if label == cluster:
similarity = cosine_similarity([user_vector], [recipe_vectors[i]])[0][0]
cluster_recipes.append((recipes[i], similarity))
# Sort by similarity
return sorted(cluster_recipes, key=lambda x: x[1], reverse=True)
Deployment
The system is containerized with Docker and deployed to Railway:
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app.py"]
Railway deployment is configured with environment variables for API keys and resource allocation.
Lessons Learned
Throughout this project, I gained valuable insights:
- Data Quality Challenges: None. Whatsoever. This was a great dataset and I am thankful.
- Clustering Efficiency: Clustering improved efficiency over traditional search-based methods.
- User Experience Design: ML applications require intuitive interfaces to be useful.
Future Improvements
Planned enhancements include:
- Expanding dataset beyond Food.com recipes.
- Adding additional filtering options (e.g., cuisine type, dietary restrictions).
- Exploring collaborative filtering to complement content-based recommendations.