A major resume-building service needed smart features — fast. The real challenge wasn’t the machine learning. It was getting three separate ML systems deployed reliably, at scale, without breaking the user experience for millions of active users.
Challenge
Good Models Are Easy. Production Is Hard.
A well-established online resume platform with over 13 million users approached us with a clear product goal: make resume creation faster and smarter. They had three features in mind — a real-time skill recommendation engine, an automated resume parser, and an AI-powered photo enhancement tool.
The challenge was never the algorithms. It was the infrastructure: each feature required a different deployment architecture, different performance tradeoffs, and all three had to be production-grade from day one — reliable enough to serve millions of sessions without degradation or downtime.
The team had historical data and product direction. What they needed was an engineering partner who could take ML from research to reliable, containerized, cloud-deployed services — and hand off a fully operational system to their in-house team.
Before
0
ML features in production
Before
Manual
Skills entry — no suggestions
Before
From scratch
Every resume, every user
Solution
Three Services. Three Architectures. One Deployment Standard.
We ran the engagement in weekly sprints with direct communication — no account managers in the middle. Each feature went through proof-of-concept validation before a single line of production code was written. Here’s how each system was built and deployed:
Skill Recommendation Engine. We trained a model on the platform’s historical resume data — millions of records, already structured and labeled. The algorithm builds an adjacency matrix from co-occurring skills and returns the most statistically relevant suggestions based on what a user has already entered. The service was packaged as a FastAPI microservice inside a Docker container, then deployed to AWS ECS (Elastic Container Service) for orchestrated, self-healing availability. Result: near-100% uptime with horizontal scaling built in.
Resume Parser (NER + Classification). Users upload a PDF in any format; the service returns a structured JSON with all relevant fields pre-filled. Under the hood, we run two models in sequence: a Named Entity Recognition model for short entities (skills, job titles, locations) and a PyTorch text classifier for longer blocks like work experience summaries and bios. Text extraction uses a production-grade PDF library; output is a clean REST API endpoint. The training dataset consisted of 40,000+ labeled resumes, annotated programmatically.
Photo Enhancement Service. Rather than training a custom model from scratch, we deployed two open-source computer vision libraries — a pragmatic choice when pretrained models already perform well. The infrastructure work was the non-trivial part: we built a two-component service with an async task queue and batch manager on one side, and a dedicated inference worker with post-processing on the other. The system runs on CPU (no GPU cost), handles approximately 1 request per second under continuous load, and is fully containerized behind a FastAPI interface.
A note on production vs. prototype: Writing model inference code is maybe 20% of the work. The other 80% is containerization, queue management, load testing, latency tuning, and making sure the service doesn’t fall over at 2am on a Tuesday. Every one of these services was stress-tested before handoff.
Result
3 Features Shipped. 4 Months. Zero Post-Launch Fires.
All three systems went live within a 4-month engagement. At handoff, the client’s internal engineering team received fully documented, containerized services running on AWS — ready to maintain and extend independently.
- Skill recommendation engine deployed to AWS ECS with near-100% availability and sub-100ms response time at scale
- Resume parser handling PDF-to-JSON conversion end-to-end, dramatically reducing time-to-complete-resume for new users
- Photo enhancement service processing ~1 RPS continuously on CPU — no expensive GPU instances required
- All three services containerized with Docker, independently deployable, and fully handed off to the client’s internal team
- Two additional prototypes delivered within the same timeline for future roadmap evaluation
- Internal engineering team hired and onboarded during the engagement — zero knowledge gap at transition
After
3
ML features live in production
After
<100ms
Recommendation API latency
After
4 months
Prototype to production
Bottom line
“ML projects don’t fail at the model level — they fail at deployment. The infrastructure around your models is what actually ships to users. Get that part right, and everything else follows.”
