Company: Recraft
Location: London, UK
Employment Type: Full time
Work Type: On-site
Department: Engineering
About Recraft
Founded in the US in 2022 and now based in London, UK, Recraft is an AI tool for professional designers, illustrators, and marketers, setting a new standard for excellence in image generation.
Recraft enables creators to quickly generate and iterate original images, vector art, illustrations, icons, and 3D graphics using AI. With over 3 million users across 200 countries producing hundreds of millions of images, Recraft is just getting started.
Our mission is to make Recraft an essential daily tool for every designer by giving creators full control over their creative process through innovative AI-powered tools.
If you’re passionate about pushing the boundaries of AI, we want you on board.
Description
Recraft is building the next generation of generative models across images and text. We are seeking an ML Data Engineer to scale data pipelines for unstructured data, primarily images, and ensure training workflows remain fast, reliable, and repeatable.
You will design and operate high-throughput ingestion and preprocessing pipelines on Kubernetes, evolve internal data frameworks, and collaborate closely with ML engineers to deliver datasets that improve model quality.
Responsibilities
- Develop and maintain data ingestion pipelines for large-scale image and occasional text or HTML datasets
- Own the full data lifecycle from raw ingestion to training-ready artifacts
- Perform filtering, deduplication, validation, and quality checks at scale
- Operate and improve Kubernetes-based distributed data pipelines
- Optimize S3-style object storage for performance, lifecycle management, and cost efficiency
- Build tooling for pipeline observability, metrics, alerts, and monitoring
- Collaborate with ML engineers to align datasets with training and experimentation needs
Requirements
Must-Have
- Strong Python skills with clean, maintainable, production-ready code
- Hands-on experience with Kubernetes, including batch and distributed processing
- Experience working with large-scale unstructured data, especially images
- Proven experience ingesting data from publicly accessible sources
- Strong understanding of object storage systems such as S3
- Detail-oriented mindset with strong ownership and reliability focus
Nice-to-Have
- Familiarity with ML workflows such as PyTorch
- Experience with image quality scoring, captioning, or image-to-text pipelines
- Experience with workflow visualization or pipeline UX tooling
- DevOps experience including Docker, CI/CD, and infrastructure automation
What We Offer
- Competitive salary and equity
- UK Skilled Worker visa sponsorship available for qualified candidates
- Direct impact on model quality and product innovation
- High ownership with strong engineering and ML support
- Modern technology stack including Python, Kubernetes, and S3
- A fast-moving environment where well-engineered systems are valued
Apply now
