Md Shohanur Islam Sobuj

Machine Learning Engineer · MLOps · LLMs · Multimodal AI

Professional Summary

Machine Learning Engineer with 6+ years delivering production ML systems across MLOps, LLMs, RAG, multimodal AI, NLP, and computer vision. Currently at Anymate Me GmbH in Köln, building an agentic slide-to-video pipeline (PDF/PPTX → RAG agent → TTS → avatar lip-sync) with p95 latency under 3 minutes and 99.4% uptime. Published researcher with 12 peer-reviewed papers across Nature, IEEE, ACL, and NeurIPS venues — 225+ Google Scholar citations, h-index 8.

Work Experience

Machine Learning Engineer

Current

Anymate Me GmbH

Dec 2024 – Present
Köln, Germany · Full-Time
  • Architected end-to-end agentic slide-to-video pipeline: PDF/PPTX → RAG content agent → TTS → avatar lip-sync, achieving p95 latency < 3 min for 10-slide decks
  • Built PPTX content rewriting agent using LLM + RAG, enabling users to restructure and localise presentation content without manual editing
  • Implemented groundedness validation (DeBERTa-v3 NLI) on generated scripts, maintaining 96.2% auto-approval rate in production
  • Designed MLOps pipelines on GCP — model versioning in MLflow, canary deploys, automated rollback on metric regression
  • Reduced pipeline error rate to < 4% through stage-level observability (Grafana + Loki) and per-job retry logic
PythonPyTorchRAGLLMsPEFTTTSGCPMLflowDockerFastAPI

Machine Learning Engineer

Business Automation Ltd.

Nov 2023 – Oct 2024
Dhaka, Bangladesh · Full-Time
  • Implemented Change Data Capture (CDC) pipeline with MySQL, Debezium, Apache Kafka, and Zookeeper — enabling real-time data sync across microservices
  • Developed SmartRemarks NLP system for automated content analysis and sentiment classification, handling 10K+ requests/day in production
  • Built OCR-based TIN certificate validation system using computer vision, achieving 95%+ extraction accuracy and reducing manual processing time by 80%
  • Designed event-driven microservices architecture for real-time ML inference, standardising deployment workflows across environments
  • Led migration of ML-backed services to containerised deployments (Docker + Kubernetes)
PythonApache KafkaDebeziumDockerKubernetesMySQLOCRNLPFastAPI

Machine Learning Engineer

Anchorblock Technology LLC

May 2022 – Oct 2023
Dhaka, Bangladesh · Full-Time
  • Built distributed ML infrastructure handling millions of data points daily with auto-scaling on AWS
  • Implemented CI/CD pipelines for ML model deployment using GitHub Actions — reduced deployment time by 60%
  • Developed conversational AI systems and LLM-powered chatbots for enterprise clients using RAG architecture
  • Designed and maintained REST APIs for seamless ML model integration into client products
  • Led technical architecture decisions for distributed computing systems across 3 concurrent client projects
PythonAWSDockerGitHub ActionsLLMsRAGFastAPICI/CDMLflow

Machine Learning Engineer

Fiverr (Freelance)

2020 – 2022
Remote · Freelance
  • Completed 50+ ML projects for clients across Europe, North America, and Asia
  • Specialised in NLP (text classification, sentiment analysis, named entity recognition) and computer vision (object detection, image classification)
  • Built end-to-end data pipelines and predictive models for e-commerce, healthcare, and finance clients
PythonTensorFlowPyTorchScikit-learnNLPComputer VisionData Analysis

Education

B. Sc. (Engineering) in Electrical and Electronic Engineering

Hajee Mohammad Danesh Science and Technology University (HSTU)

  • Research focus: Natural Language Processing, Deep Learning, Bangla language models
  • Published 4 research papers during undergraduate studies
Dinajpur, Bangladesh

Technical Skills

LLMs & Generative AI
RAGPEFTLoRAQLoRAPrompt EngineeringLangChainHugging Face TransformersFine-TuningEmbeddingsVector Search
MLOps & Infrastructure
MLflowDockerKubernetesGCPAWSGitHub ActionsCI/CDApache KafkaDebeziumApache Airflow
Machine Learning & Deep Learning
PyTorchTensorFlowScikit-learnTransformersComputer VisionNLPOCRTime-Series ForecastingTTSAvatar Lip-sync
Programming & APIs
PythonSQLFastAPIFlaskREST APIsJavaScriptR
Databases & Storage
PostgreSQLMySQLMongoDBRedisQdrantElasticsearch

Publications & Research

12 papers · 225+ citations · h-index 8Google Scholar
LLM-Mixer: Multiscale Mixing in LLMs for Time Series Forecasting

Md Kowsher, Md Shohanur Islam Sobuj, Nusrat Jahan Prottasha, et al.

NeurIPS Workshop202515 citations
PEFT A2Z: Parameter-Efficient Fine-Tuning Survey for Large Language and Vision Models

Nusrat Jahan Prottasha, ..., Md Shohanur Islam Sobuj, ..., Md Kowsher, et al.

arXiv202514 citations
OCR-Enhanced Digital Signatures for Tamper-Proof Document Integrity Verification

Mohammad Majbah Uddin, Md Shohanur Islam Sobuj

QPAIN 2025 (IEEE)2025
Securing Electric Vehicle Performance: Machine Learning-Driven Fault Detection and Classification

Md Shohanur Islam Sobuj, et al.

IEEE Access202460 citationsIEEE Access (Q1)
Parameter-Efficient Fine-Tuning of Large Language Models Using Semantic Knowledge Tuning

Nusrat Jahan Prottasha, Asif Mahmud, Md Shohanur Islam Sobuj, et al.

Scientific Reports (Nature Publishing Group)202436 citationsNature / Scientific Reports
Leveraging Pre-trained CNNs for Efficient Feature Extraction in Rice Leaf Disease Classification

Md Shohanur Islam Sobuj, Md Imran Hossen, Md Foysal Mahmud, Mahbub Ul Islam Khan

iCACCESS 2024 (IEEE)202411 citations
L-TUNING: Synchronized Label Tuning for Prompt and Prefix Tuning in LLMs

Md Kowsher, Md Shohanur Islam Sobuj, Asif Mahmud, Nusrat Jahan Prottasha, Prakash Bhat

Tiny Papers @ ICLR 202420248 citations
Contrastive Learning for Universal Zero-Shot NLI with Cross-Lingual Sentence Embeddings

Md Kowsher, Md Shohanur Islam Sobuj, Nusrat Jahan Prottasha, Mohammad Shamsul Arefin, Yasuhiko Morimoto

Findings of EMNLP 202320233 citations
An Enhanced Neural Word Embedding Model for Transfer Learning

Md Kowsher, Md Shohanur Islam Sobuj, et al.

Applied Sciences (MDPI)202243 citations
A Classical Approach to Handcrafted Feature Extraction for Bangla Handwritten Digit Recognition

Md Ferdous Wahid, Md Fahim Shahriar, Md Shohanur Islam Sobuj

ICECIT 2021 (IEEE)202123 citations
BanglaLM: Data Mining Based Bangla Corpus for Language Model Research

Md Shohanur Islam Sobuj, et al.

ICIRCA 2021 (IEEE)20219 citations
An Efficient Approach on Sentiment Analysis of Bangla Social Media Data Using FastText

Md Shohanur Islam Sobuj, et al.

Research on Computational Language20212 citations

Languages

EnglishProfessional Working (C1)
GermanBasic (A1)
BengaliNative