§ Methodology & context

A Databricks-first market index, scored on twelve public signals.

Lakehouse100 was developed through a structured market scan of UK-based Databricks and modern data platform practitioners. Certification-led research, public records, candidate materials, external validation, and editorial review — combined into a four-tier classification that practitioners actually consent to before publication.

§ 01 · Pipeline

From a thousand profiles down to one hundred confirmed names.

A four-stage funnel. We're currently at Stage 03 — practitioner outreach and consent.

selection_pipeline.run()

in progress · ETA 01 aug 2026 · stage 03/04

Stage 01 · Longlist

1,067

Public profiles assembled from open signals and reader nominations across the UK Databricks market.

complete · Mar 2026

Stage 02 · Score

250

Evidence-scored against twelve public signals across delivery, contribution, depth, and peer recognition.

complete · Apr 2026

Stage 03 · Invite & consent

120

Invited for consent and profile confirmation. We never publish without explicit consent.

in progress · May–Aug

Stage 04 · Publish

100

Confirmed, published with full citations, segment classification, and tier badges. One hundred names, four tiers.

upcoming · 01 aug 2026

§ 02 · Twelve evidence signals

What gets scored — and roughly how much it counts.

No single signal carries the index. Strong credential signals are moderated against hands-on evidence, market experience, public validation, and client relevance.

S-01w · 1.25

Production delivery

Live Databricks systems shipped under their name — programmes, lakehouse migrations, named clients.

Delivery

S-02w · 1.10

Senior judgment

Architecture decisions, programme leadership, mentoring engineers in production environments.

Delivery

S-03w · 1.00

Client outcomes

Evidence the work moved a real metric — cost, latency, model performance, time-to-insight.

Delivery

S-04w · 0.95

Open-source contribution

Commits, maintained packages, libraries serving the wider Databricks/Spark ecosystem.

Contribution

S-05w · 0.90

Conference talks

DAIS, Spark Summit, regional meetups, technical workshops. Published talks count more than panel appearances.

Contribution

S-06w · 0.85

Technical writing

Blog posts, Substack issues, white papers, books — sustained, technical, public.

Contribution

S-07w · 1.15

Specialist depth

A single Databricks segment they own. Generic data engineering experience alone is not sufficient.

Depth

S-08w · 0.80

Certifications & credentials

Databricks Architect, Engineer Professional, ML Engineer — moderated against hands-on evidence.

Depth

S-09w · 0.75

Tenure on platform

Years working with Databricks, depth of platform exposure, breadth of features touched in anger.

Depth

S-10w · 1.05

Peer endorsement

Other practitioners cite, recommend, or learn from their work. Not reciprocal LinkedIn endorsements.

Peer

S-11w · 0.85

Teaching & mentorship

Internal academies, public courses, sustained coaching relationships, community organising.

Peer

S-12w · 0.70

Market visibility

Recognition in client searches, named in RFPs, asked-for by reputation. Quieter signal — moderates the rest.

Peer

§ 03 · Four tiers

One hundred names. Four named tiers.

Tiers reflect different combinations of breadth, specialist depth, and verified delivery evidence — not a strict ranking from “best” to “worst”.

Tier I

Vanguard 10

Top overall · 10 names

Highest-signal practitioners combining Databricks proof, senior judgment, delivery depth, and client usefulness. Often visible on multiple signals at once — they ship, they teach, they contribute, they're cited.

Minimum bar

✓Strong on 4+ signals

✓Production delivery evidence

✓Public contribution presence

Tier II

Pathfinder 15

Specialist depth · 15 names

Specialist and senior profiles with strong platform, migration, governance, AI, engineering, or delivery signals. A practitioner you'd ask first for the specific thing they're known for.

Minimum bar

✓Top-quartile on specialist depth

✓Public evidence in their segment

✓Peer-cited within the niche

Tier III

Operator 25

Delivery-heavy · 25 names

Proven delivery-heavy practitioners who can help organisations move Lakehouse work into production. Less public-facing — but the people who reliably get things built and live.

Minimum bar

✓Multiple named programmes

✓Verified client outcomes

✓Senior judgment in role

Tier IV

Field 50

High signal · 50 names

Credible high-signal practitioners with Databricks relevance and future movement potential. Often earlier-career or newly-public — names worth tracking into the next edition.

Minimum bar

✓Clear Databricks-first relevance

✓At least one strong signal

✓Verifiable public record

§ 04 · Capability segments

Eight segments. Each scored independently.

A practitioner can be strong across multiple segments — but we publish their primary one alongside up to two secondary classifications.

SEG_01

Financial Services

Banks, insurers, hedge funds, asset managers. Risk, fraud, compliance, market data.

riskfraudcapital markets

SEG_02

Data Engineering

Pipelines & performance. Delta Lake, Spark optimisation, ingestion patterns, streaming.

deltasparkstreaming

SEG_03

Unity Catalog

Governance & access. Lineage, fine-grained access, multi-workspace at scale.

governancelineageaccess

SEG_04

Platform DevOps & FinOps

Deployment & cost. Terraform, asset bundles, cluster policies, spend governance.

terraformbundlesfinops

SEG_05

MLflow & MLOps

Model lifecycle & serving. Registry, model serving, feature stores in production.

mlflowservingfeature_store

SEG_06

GenAI & Agents

LLMs & agentic systems. RAG, vector search, agent frameworks, evals.

ragagentsevals

SEG_07

Public Sector & Health

NHS, central government, regulators. Sensitive data, compliance-heavy lakehouse work.

nhsgovregulated

SEG_08

Retail & Industrial

Retail, manufacturing, energy, logistics. Demand, supply chain, IoT-scale Databricks.

demandsupplyiot

§ 05 · Exclusions

Who isn't eligible — and why.

A short list. Independence matters more than coverage.

EX·01

Current Databricks employees

Anyone presently employed by Databricks is excluded from the index, regardless of seniority or public visibility.

Why

The index recognises practitioners using Databricks in client environments — not the people who build it. Former Databricks employees now in practitioner or client roles remain eligible.

EX·02

Paid placements

No vendor or sponsor pays to appear in Lakehouse100. Marketing budget does not interact with editorial.

Why

The list's only value is independence. The moment placement is purchasable, it isn't an index — it's a directory.

EX·03

Generic data engineering profiles

Strong generic data engineering experience without specific Databricks-first evidence is not sufficient on its own.

Why

We're an index of Databricks practitioners — not a general data engineer ranking. Adjacent ecosystems are excellent; they're just not the scope.

EX·04

Non-UK practitioners

The H1 2026 edition is UK-only. Practitioners based outside the UK are reviewed for future regional editions.

Why

Regional editions stay credible. We'd rather publish a deeply-verified UK list than a thin global one.

§ 06 · Publication standard

What we publish — and what we never do.

Public profiles appear only after consent and admin approval. The published profile is a strict subset of what we hold.

+Shown publicly

Name name
Tier badge tier
Approved citation citation
Display title display_title
Current company current_company
LinkedIn URL (confirmed) linkedin
Approved public links links[]
Photo, only if consented photo*
Capability segments segments[]

§ 07 · Refresh cadence

Twice a year. Same standard each time.

Lakehouse100 refreshes every six months. Every edition runs the same pipeline — longlist, score, invite, publish — so movement between editions is meaningful, and the bar stays stable.

MAR 2026

Longlist assembly1,067 profiles · UK Databricks market scan

done

APR 2026

Scoring & tier draft250 scored against 12 signals

done

MAY–AUG

Invite & consentoutreach in progress

now

17 JUL

Nominations closefinal triage · final scoring

upcoming

01 AUG 2026

H1 2026 UK Index publishes100 confirmed names · four tiers

upcoming

Q3 2026

H2 2026 pipeline openssame process · same bar

upcoming

From a thousand profiles down to one hundred confirmed names.

What gets scored — and roughly how much it counts.

Production delivery

Senior judgment

Client outcomes

Open-source contribution

Conference talks

Technical writing

Specialist depth

Certifications & credentials

Tenure on platform

Peer endorsement

Teaching & mentorship

Market visibility

One hundred names. Four named tiers.

Vanguard 10

Pathfinder 15

Operator 25

Field 50

Eight segments. Each scored independently.

Who isn't eligible — and why.

Current Databricks employees

Paid placements

Generic data engineering profiles

Non-UK practitioners

What we publish — and what we never do.

+Shown publicly

−Never shown publicly

Twice a year. Same standard each time.