PRE-LAUNCH · VERIFICATION IN PROGRESSRELEASE H1.2026.08UK EDITION
OPENS · 01 AUG 2026STATUS
Lakehouse100/Methodology
§ Methodology & context

A Databricks-first market index, scored on twelve public signals.

Lakehouse100 was developed through a structured market scan of UK-based Databricks and modern data platform practitioners. Certification-led research, public records, candidate materials, external validation, and editorial review — combined into a four-tier classification that practitioners actually consent to before publication.

§ 01 · Pipeline

From a thousand profiles down to one hundred confirmed names.

A four-stage funnel. We're currently at Stage 03 — practitioner outreach and consent.

selection_pipeline.run()
in progress · ETA 01 aug 2026 · stage 03/04
Stage 01 · Longlist
1,067
Public profiles assembled from open signals and reader nominations across the UK Databricks market.
complete · Mar 2026
Stage 02 · Score
250
Evidence-scored against twelve public signals across delivery, contribution, depth, and peer recognition.
complete · Apr 2026
Stage 03 · Invite & consent
120
Invited for consent and profile confirmation. We never publish without explicit consent.
in progress · May–Aug
Stage 04 · Publish
100
Confirmed, published with full citations, segment classification, and tier badges. One hundred names, four tiers.
upcoming · 01 aug 2026
§ 02 · Twelve evidence signals

What gets scored — and roughly how much it counts.

No single signal carries the index. Strong credential signals are moderated against hands-on evidence, market experience, public validation, and client relevance.

S-01w · 1.25

Production delivery

Live Databricks systems shipped under their name — programmes, lakehouse migrations, named clients.

Delivery
S-02w · 1.10

Senior judgment

Architecture decisions, programme leadership, mentoring engineers in production environments.

Delivery
S-03w · 1.00

Client outcomes

Evidence the work moved a real metric — cost, latency, model performance, time-to-insight.

Delivery
S-04w · 0.95

Open-source contribution

Commits, maintained packages, libraries serving the wider Databricks/Spark ecosystem.

Contribution
S-05w · 0.90

Conference talks

DAIS, Spark Summit, regional meetups, technical workshops. Published talks count more than panel appearances.

Contribution
S-06w · 0.85

Technical writing

Blog posts, Substack issues, white papers, books — sustained, technical, public.

Contribution
S-07w · 1.15

Specialist depth

A single Databricks segment they own. Generic data engineering experience alone is not sufficient.

Depth
S-08w · 0.80

Certifications & credentials

Databricks Architect, Engineer Professional, ML Engineer — moderated against hands-on evidence.

Depth
S-09w · 0.75

Tenure on platform

Years working with Databricks, depth of platform exposure, breadth of features touched in anger.

Depth
S-10w · 1.05

Peer endorsement

Other practitioners cite, recommend, or learn from their work. Not reciprocal LinkedIn endorsements.

Peer
S-11w · 0.85

Teaching & mentorship

Internal academies, public courses, sustained coaching relationships, community organising.

Peer
S-12w · 0.70

Market visibility

Recognition in client searches, named in RFPs, asked-for by reputation. Quieter signal — moderates the rest.

Peer
§ 03 · Four tiers

One hundred names. Four named tiers.

Tiers reflect different combinations of breadth, specialist depth, and verified delivery evidence — not a strict ranking from “best” to “worst”.

Tier I

Vanguard 10

Top overall · 10 names
Highest-signal practitioners combining Databricks proof, senior judgment, delivery depth, and client usefulness. Often visible on multiple signals at once — they ship, they teach, they contribute, they're cited.
Minimum bar
Strong on 4+ signals
Production delivery evidence
Public contribution presence
Tier II

Pathfinder 15

Specialist depth · 15 names
Specialist and senior profiles with strong platform, migration, governance, AI, engineering, or delivery signals. A practitioner you'd ask first for the specific thing they're known for.
Minimum bar
Top-quartile on specialist depth
Public evidence in their segment
Peer-cited within the niche
Tier III

Operator 25

Delivery-heavy · 25 names
Proven delivery-heavy practitioners who can help organisations move Lakehouse work into production. Less public-facing — but the people who reliably get things built and live.
Minimum bar
Multiple named programmes
Verified client outcomes
Senior judgment in role
Tier IV

Field 50

High signal · 50 names
Credible high-signal practitioners with Databricks relevance and future movement potential. Often earlier-career or newly-public — names worth tracking into the next edition.
Minimum bar
Clear Databricks-first relevance
At least one strong signal
Verifiable public record
§ 04 · Capability segments

Eight segments. Each scored independently.

A practitioner can be strong across multiple segments — but we publish their primary one alongside up to two secondary classifications.

SEG_01
Financial Services
Banks, insurers, hedge funds, asset managers. Risk, fraud, compliance, market data.
riskfraudcapital markets
SEG_02
Data Engineering
Pipelines & performance. Delta Lake, Spark optimisation, ingestion patterns, streaming.
deltasparkstreaming
SEG_03
Unity Catalog
Governance & access. Lineage, fine-grained access, multi-workspace at scale.
governancelineageaccess
SEG_04
Platform DevOps & FinOps
Deployment & cost. Terraform, asset bundles, cluster policies, spend governance.
terraformbundlesfinops
SEG_05
MLflow & MLOps
Model lifecycle & serving. Registry, model serving, feature stores in production.
mlflowservingfeature_store
SEG_06
GenAI & Agents
LLMs & agentic systems. RAG, vector search, agent frameworks, evals.
ragagentsevals
SEG_07
Public Sector & Health
NHS, central government, regulators. Sensitive data, compliance-heavy lakehouse work.
nhsgovregulated
SEG_08
Retail & Industrial
Retail, manufacturing, energy, logistics. Demand, supply chain, IoT-scale Databricks.
demandsupplyiot
§ 05 · Exclusions

Who isn't eligible — and why.

A short list. Independence matters more than coverage.

EX·01

Current Databricks employees

Anyone presently employed by Databricks is excluded from the index, regardless of seniority or public visibility.

Why
The index recognises practitioners using Databricks in client environments — not the people who build it. Former Databricks employees now in practitioner or client roles remain eligible.
EX·02

Paid placements

No vendor or sponsor pays to appear in Lakehouse100. Marketing budget does not interact with editorial.

Why
The list's only value is independence. The moment placement is purchasable, it isn't an index — it's a directory.
EX·03

Generic data engineering profiles

Strong generic data engineering experience without specific Databricks-first evidence is not sufficient on its own.

Why
We're an index of Databricks practitioners — not a general data engineer ranking. Adjacent ecosystems are excellent; they're just not the scope.
EX·04

Non-UK practitioners

The H1 2026 edition is UK-only. Practitioners based outside the UK are reviewed for future regional editions.

Why
Regional editions stay credible. We'd rather publish a deeply-verified UK list than a thin global one.
§ 06 · Publication standard

What we publish — and what we never do.

Public profiles appear only after consent and admin approval. The published profile is a strict subset of what we hold.

+Shown publicly

  • Name name
  • Tier badge tier
  • Approved citation citation
  • Display title display_title
  • Current company current_company
  • LinkedIn URL (confirmed) linkedin
  • Approved public links links[]
  • Photo, only if consented photo*
  • Capability segments segments[]
§ 07 · Refresh cadence

Twice a year. Same standard each time.

Lakehouse100 refreshes every six months. Every edition runs the same pipeline — longlist, score, invite, publish — so movement between editions is meaningful, and the bar stays stable.

MAR 2026
Longlist assembly1,067 profiles · UK Databricks market scan
done
APR 2026
Scoring & tier draft250 scored against 12 signals
done
MAY–AUG
Invite & consentoutreach in progress
now
17 JUL
Nominations closefinal triage · final scoring
upcoming
01 AUG 2026
H1 2026 UK Index publishes100 confirmed names · four tiers
upcoming
Q3 2026
H2 2026 pipeline openssame process · same bar
upcoming