Rohith Kumar Reddipogula — AI & NLP Engineer

// ABOUT

I research how retrieval actually works, then ship it. Most hybrid RAG tutorials assume sparse and dense retrieval should be weighted 50/50. My thesis tested that assumption directly — eleven tracked experiments later, the real optimum was alpha=0.70, validated with a paired t-test (p=0.002). That single number improved Recall@10 by 11.4% over the naive baseline.

After graduating I kept building instead of stopping at the thesis: fine-tuning LLMs with QLoRA on a free Colab GPU, orchestrating four-agent LangGraph pipelines, deploying to AWS EC2 with systemd, running Kubernetes with zero-downtime rollouts, and tracking every experiment publicly on MLflow.

I'm now looking for an ML Engineer, NLP Engineer, or AI Engineer role in Berlin. Job Seeker Visa holder, EU Blue Card eligible, available immediately.

Core ML & NLP

PythonPyTorchHuggingFaceRAGFAISSBM25E5-base-v2

LLM stack

LangChainLangGraphQLoRARAGASGeminiGroqUnsloth

Infrastructure

DockerKubernetesAWS EC2FastAPIsystemdBand SDK

MLOps

MLflowDagsHubHF SpacesGit

Live systems

8 results, sorted by relevance

// query: "production-grade AI systems built solo" — 0.04s

01thesis

Hybrid RAG Systemlive

BM25 sparse search fused with Microsoft E5 dense embeddings across 8.84M MS MARCO passages. The fusion weight (alpha=0.70) was discovered through eleven tracked experiments, not assumed.

93% Recall@10 · +11.4% vs baseline · MRR=1.0

FAISSBM25E5FastAPIDocker

Demo → API → GitHub →

0.98

relevance

02agent

Multi-Agent Research Pipelinelive

Four specialised LangGraph agents — Search (DuckDuckGo), Summarise (temp 0.3), Fact-Check (temp 0.1), Writer (temp 0.4) — each tuned for one job in the pipeline.

4 agents · stateful LangGraph · 3 distinct temperatures

LangGraphGeminiDuckDuckGoDocker

Demo → GitHub →

0.95

relevance

03agent

ReAct AI Agentlive

A LangGraph ReAct agent holding three tools — web search, calculator, RAG retrieval — and reasoning about which one a given question actually needs.

3 tools · autonomous tool selection

LangGraphGeminiStreamlit

Demo → GitHub →

0.93

relevance

04tuning

QLoRA Fine-Tuning — TinyLlama 1.1Blive

4-bit quantisation with LoRA adapters on a free Colab T4 GPU. Trained on NLP domain knowledge, published openly on HuggingFace Hub.

0.089% of params trained · loss 2.47 → 0.89

QLoRAUnslothTinyLlama

Model → GitHub →

0.91

relevance

05eval

LLM Evaluation Dashboardlive

RAGAS metrics with Gemini-as-judge across ten NLP test questions, rendered on a live radar chart.

Faithfulness 0.909 · overall RAGAS 0.877

RAGASGeminiPlotlyStreamlit

Dashboard → GitHub →

0.89

relevance

06infra

AWS EC2 Deployment

FastAPI RAG inference API on a t3.micro instance, Frankfurt region, with systemd auto-restart on crash or reboot and SSH key auth.

eu-central-1 · 3 REST endpoints · OpenAPI spec

AWS EC2UbuntusystemdFastAPI

Live API → GitHub →

0.86

relevance

07infra

Kubernetes Orchestration

The same RAG API running with two replicas under a RollingUpdate strategy — liveness and readiness probes, resource limits, zero-downtime deploys.

2 replicas · RollingUpdate · health probes

KubernetesDockerkubectl

GitHub →

0.83

relevance

08mlops

MLflow Experiment Tracking

Three tracked experiment families — the RAG alpha search (11 runs), the QLoRA loss curve (16 steps), and the RAGAS evaluation sweep — all on a public dashboard.

11 alpha runs · public DagsHub dashboard

MLflowDagsHubPython

Dashboard → GitHub →

0.81

relevance

The fusion weight

why alpha=0.70 matters

MSc thesis finding

Sparse and dense retrieval aren't equal partners — and assuming they are costs you accuracy.

Most hybrid RAG implementations split sparse (BM25) and dense (embedding) retrieval 50/50 by default. I ran eleven systematic experiments across the fusion weight and found the true optimum sits at alpha=0.70 in favour of dense retrieval — confirmed with a paired t-test, p=0.002. That one calibration choice is worth +11.4% Recall@10 over the naive split, on a corpus of 8.84M passages.

Corpus size8.84M passages
Dense modelMicrosoft E5-base-v2
Sparse methodBM25
Experiments run11 tracked (MLflow)
Statistical testpaired t-test
Optimal alpha0.70 (p=0.002)

Background

education & experience

2026 — Present

Independent AI Engineer

Berlin, Germany — Job Seeker Visa

Built and deployed eight production AI systems covering RAG, LLM fine-tuning, multi-agent architectures, LLM evaluation, AWS EC2, Kubernetes, and MLflow tracking — all public and documented on GitHub.

Mar 2024 — Mar 2026

MSc Data Science

University of Europe for Applied Sciences — Potsdam, Germany

Thesis: hybrid RAG combining BM25 sparse retrieval with Microsoft E5-base-v2 dense embeddings. 93% Recall@10 on 8.84M MS MARCO passages. Optimal fusion weight alpha=0.70, validated with a paired t-test, p=0.002. Research informed by, and consistent with, published findings on hybrid fusion weighting in the retrieval-augmented generation literature.

I build systems that
find the right answer
among millions.

Live systems

The fusion weight

Sparse and dense retrieval aren't equal partners — and assuming they are costs you accuracy.

Background

Let's build something
worth retrieving.

Live systems

The fusion weight

Sparse and dense retrieval aren't equal partners — and assuming they are costs you accuracy.

Background

Let's build somethingworth retrieving.

Let's build something
worth retrieving.