S
~/shohanursobuj.dev
|Open to talks
$whoami --verbose
ML Engineer · MLOps · Köln, Germany

ML in production.
Trained, deployed, observed — shipped.

Machine Learning Engineer focused on MLOps and DevOps for AI — building scalable pipelines, cloud infrastructure, and production-grade LLM / multimodal systems. Research on the side.

See selected workGet in touch
🇩🇪Köln, Germany
5+
Years in production ML
225+
Scholar citations
13
AI products shipped
30+
Freelance deliveries
portrait.jpg
Md Shohanur Islam Sobuj
Md Shohanur Islam Sobuj
Köln, Germany
Open to talks
ML Engineer · MLOps · LLMs · Köln
Shohanur Islam SobujML.Engineer ↗
Now: building multimodal AI at Anymate Meresearch · LLM-MixerMLOps · multimodalResponds in 24h · CETNow: building multimodal AI at Anymate Meresearch · LLM-MixerMLOps · multimodalResponds in 24h · CET
01Selected work
Production systems, published research — and the blur between them.
01Production
Anymate Me GmbH · 2024–Present
Agentic Slide-to-Video Engine
Prompt / PDF / PPTX → AI agent → lip-sync avatar video

Full multimodal pipeline: user provides a prompt, uploads a PDF, or drops a PPTX — a RAG-backed AI agent intelligently rewrites and structures the slide content, then drives a speech-synthesis + avatar lip-sync system to render the final video. Also engineered a standalone PPTX content agent: give it a natural-language instruction and it edits text, layout, or data on any slide.

Prompt / PDF / PPTX inputRAG-backed slide rewritingAvatar lip-sync at scale
MLOpsMultimodalRAGAgentGCPPyTorch
02Production
Anchorblock Technology · 2022–23
Finix
AI-powered banking assistant

Conversational AI banking platform with voice interface, transaction intelligence, and personalised financial insights. Shipped to Google Play with Docker + GitHub Actions CI/CD.

Google Play shippedVoice + text interface
Conversational AIFastAPIDockerCI/CD
03Production
Business Automation Ltd · 2023–24
CDC Data Pipeline
Real-time change-data-capture for document intelligence

Production Kafka/Debezium CDC pipeline with downstream OCR validation and NLP analyzers processing business documents at scale.

Real-time CDCOCR + NLP downstream
KafkaDebeziumOCRNLPPython
LLM-Mixer
Multiscale mixing in LLMs for time-series forecasting

Treats time-series patches as language tokens; multiscale mixing on a frozen LLM backbone achieves SOTA on 4 of 7 benchmarks with 30% fewer FLOPs.

SOTA on 4/7 datasets30% fewer FLOPs
Time-seriesLLMPEFTPyTorch
OCR Tamper-Proof Signing
Document integrity verification

OCR-enhanced digital signature system that detects in-place document tampering with 99.4% accuracy at sub-second verification speed. Published at IEEE QPAIN 2025.

99.4% tamper detectionSub-second verify
OCRSecurityIEEE
06Research
ICLR 2024 · Tiny Papers
L-TUNING
Synchronized label tuning for LLMs

Novel prompt/prefix-tuning method with synchronized label optimization — +3.7% on GLUE with 92% fewer trainable parameters than full fine-tuning.

+3.7% acc on GLUE92% fewer params
PEFTLLMsNLPPyTorch
02Experience

Six years.
Four chapters.

Anymate Me GmbH
Anymate Me GmbH
Machine Learning Engineer
Dec 2024 — PresentNOWKöln, GermanyFull-Time
  • Developing and deploying comprehensive MLOps pipelines for PPTX→video generation at scale.
  • Building multimodal stack: speech synthesis, avatar lip-sync, slide intelligence, RAG.
  • Owning end-to-end model lifecycle from training through canary deploys to production.
PythonPyTorchMLOpsGCPDocker
Business Automation Ltd.
Business Automation Ltd.
Machine Learning Engineer
Nov 2023 — Oct 2024Dhaka, BangladeshFull-Time
  • Architected CDC pipelines (MySQL · Debezium · Kafka · Zookeeper) feeding live ML services.
  • Shipped SmartRemarks — an intelligent text-analysis system for internal operations.
  • Delivered a high-accuracy OCR service for TIN-certificate validation (>99% precision).
  • Built and maintained scalable ML microservices infrastructure.
KafkaDebeziumMySQLOCRNLPDocker
Anchorblock Technology LLC
Anchorblock Technology LLC
Machine Learning Engineer
May 2022 — Oct 2023Dhaka, BangladeshFull-Time
  • Led design, development and delivery of 13 AI/ML products across conversational AI, NLP, CV and FinTech.
  • Architected LLM-powered enterprise knowledge bots using advanced RAG techniques.
  • Configured and managed scalable AWS infra (EC2, ECS, S3) and CI/CD via GitHub Actions.
LLMsRAGAmazon LexAWSFastAPICI/CDDocker
Self-Employed
Self-Employed
Freelance ML Engineer
Jan 2019 — Apr 2022RemoteFreelance
  • Partnered with international clients to design and deploy custom ML solutions.
  • Delivered 30+ projects across computer vision, NLP and predictive analytics.
  • Built production systems for image classification, real-time object detection and ASR.
PyTorchTensorFlowOpenCVFastAPIAWSDocker
03Research & publications

Selected papers,
research on the side.

Google Scholar ↗
2025NeurIPS Workshop
LLM-Mixer: Multiscale Mixing in LLMs for Time Series Forecasting
15 citations
2025arXiv
PEFT A2Z: Parameter-Efficient Fine-Tuning Survey for Large Language and Vision Models
14 citations
PDF ↗
2025QPAIN
OCR-Enhanced Digital Signatures for Tamper-Proof Document Integrity Verification
PDF ↗
2024IEEE Access
Securing Electric Vehicle Performance: Machine Learning-Driven Fault Detection and Classification
60 citations
PDF ↗
2024Scientific Reports
Parameter-Efficient Fine-Tuning of Large Language Models Using Semantic Knowledge Tuning
36 citations
PDF ↗
2024iCACCESS
Leveraging Pre-trained CNNs for Efficient Feature Extraction in Rice Leaf Disease Classification
11 citations
PDF ↗
2024arXiv / ICLR
L-TUNING: Synchronized Label Tuning for Prompt and Prefix Tuning in LLMs
8 citations
2023EMNLP Workshop
Contrastive Learning for Universal Zero-Shot NLI with Cross-Lingual Sentence Embeddings
3 citations
PDF ↗
2022Applied Sciences
An Enhanced Neural Word Embedding Model for Transfer Learning
43 citations
PDF ↗
2021ICECIT
A Classical Approach to Handcrafted Feature Extraction for Bangla Handwritten Digit Recognition
23 citations
PDF ↗
2021ICIRCA
BanglaLM: Data Mining Based Bangla Corpus for Language Model Research
9 citations
PDF ↗
2021Res. Comput. Lang.
An Efficient Approach on Sentiment Analysis of Bangla Social Media Data Using FastText
2 citations
PDF ↗
04Stack

The graph I think in.

Research up top, engineering in the middle, cloud underneath. The fun is in the edges.

serving
DockerKubernetesFastAPIKafkaMLflowDebezium
cloud
AWS · EC2/ECS/S3GCPGitHub ActionsTerraform
core
PythonPyTorchTensorFlowTransformers
domains
LLMs / RAGNLPComputer VisionOCRTime-series
skills.graph19 nodes · 26 edges
05ai studio · 2025–26

What I'm
shipping now.

Three active workstreams — agentic systems, generative UGC pipelines, and workflow automation. All in production, all instrumented with evals and cost telemetry.

agentic
Chatbot Agents
Tool-using LLMs in prod.

Multi-step agents with RAG, function-calling, structured outputs and refusal logic. LangGraph + custom eval gates before every release.

<400ms
p95 first token
94%
tool routing acc.
LangGraphClaudeOpenAIpgvectorRedis
ugc
AI UGC Pipeline
Script → avatar → video.

PPTX / PDF / prompt to narrated video: script gen, multilingual TTS, lip-sync avatars, branded templates. Live at Anymate Me.

60+
languages
5min
to first video
DiffusionTTSFFmpegWhisperS3
automation
Workflow Automation
Agents that do the work.

Doc classification, OCR validation, routing — built as idempotent event workers with DLQs, retry logic and full audit trails.

99.2%
OCR accuracy
-68%
manual reviews
KafkaDebeziumTemporalFastAPIPostgres
06case study
Production · Multimodal AI

From prompt to
lip-sync video.

A user uploads a PDF, PPTX, or types a prompt. A RAG-backed content agent parses the input, rewrites and structures slide content, then hands off to a TTS engine for voiceover. An avatar renderer synchronises lip movement and gesture, and the final video is encoded and delivered — all on a managed GCP pipeline with quality gates at every stage.

< 3 min
p95 end-to-end (prompt → video)
96.2%
slide groundedness (RAG eval)
99.4%
pipeline uptime (GCP)
< 4%
content rewrite error rate
PyTorchRAGTTSAvatar Lip-syncFastAPIGCPMLflowDocker
fig. 1 — slide-to-video pipelinelive
INPUTPARSEAGENTRENDEROBSERVEPromptPDF / PPTXAPI / URLdoc parserslide extractorRAG retrievaldense + rerankcontextchunked · top-kPPTX content agentplan · rewrite · validate · JSON-schema outputTTS enginespeech synthesisavatar renderlip-sync · gesturevideo encoderGCP · CDN deliveryMLflow · Prometheus · quality gategroundedness · latency SLO · video quality score
07how I work

How I ship
AI in 2026.

01Evals first
Golden set before a line of code runs.
LLM-as-judge + human spot-checks. Catch regressions before prod, not after a customer complaint.
02Build
Prompt → RAG → PEFT — in that order.
Pull the cheapest lever first. LoRA or QLoRA when the task demands it, agents and tool-calling only where they carry real weight.
03Ship
Reproducible, observable, rollback-able.
Docker, MLflow, CI/CD on GCP. Token-level tracing, latency SLOs, canary before blue/green. Cost is a first-class metric.
04Tune
Smaller, faster, grounded.
PEFT over full fine-tuning. Quantization, prompt caching, guardrails. Red-team evals and continuous A/B — never ship and forget.
08Playground

Rather ask than read?
There's a small model for that.

Ask the model anything · grounded on cv.json
Hi — I'm a small model grounded on Shohanur's CV. Ask me anything.
A small chat agent grounded on Shohanur's CV. Try: "What did you build at Anchorblock?"
10Contact

Let's talk.
Research, production, or both.

Always happy to talk ML, research, or interesting engineering problems. Based in Köln — working across DE & EU.

Open to conversations

// send a message