A Databricks-first market index, scored on twelve public signals.
Lakehouse100 was developed through a structured market scan of UK-based Databricks and modern data platform practitioners. Certification-led research, public records, candidate materials, external validation, and editorial review — combined into a four-tier classification that practitioners actually consent to before publication.
From a thousand profiles down to one hundred confirmed names.
A four-stage funnel. We're currently at Stage 03 — practitioner outreach and consent.
What gets scored — and roughly how much it counts.
No single signal carries the index. Strong credential signals are moderated against hands-on evidence, market experience, public validation, and client relevance.
Production delivery
Live Databricks systems shipped under their name — programmes, lakehouse migrations, named clients.
DeliverySenior judgment
Architecture decisions, programme leadership, mentoring engineers in production environments.
DeliveryClient outcomes
Evidence the work moved a real metric — cost, latency, model performance, time-to-insight.
DeliveryOpen-source contribution
Commits, maintained packages, libraries serving the wider Databricks/Spark ecosystem.
ContributionConference talks
DAIS, Spark Summit, regional meetups, technical workshops. Published talks count more than panel appearances.
ContributionTechnical writing
Blog posts, Substack issues, white papers, books — sustained, technical, public.
ContributionSpecialist depth
A single Databricks segment they own. Generic data engineering experience alone is not sufficient.
DepthCertifications & credentials
Databricks Architect, Engineer Professional, ML Engineer — moderated against hands-on evidence.
DepthTenure on platform
Years working with Databricks, depth of platform exposure, breadth of features touched in anger.
DepthPeer endorsement
Other practitioners cite, recommend, or learn from their work. Not reciprocal LinkedIn endorsements.
PeerTeaching & mentorship
Internal academies, public courses, sustained coaching relationships, community organising.
PeerMarket visibility
Recognition in client searches, named in RFPs, asked-for by reputation. Quieter signal — moderates the rest.
PeerOne hundred names. Four named tiers.
Tiers reflect different combinations of breadth, specialist depth, and verified delivery evidence — not a strict ranking from “best” to “worst”.
Vanguard 10
Pathfinder 15
Operator 25
Field 50
Eight segments. Each scored independently.
A practitioner can be strong across multiple segments — but we publish their primary one alongside up to two secondary classifications.
Who isn't eligible — and why.
A short list. Independence matters more than coverage.
Current Databricks employees
Anyone presently employed by Databricks is excluded from the index, regardless of seniority or public visibility.
Paid placements
No vendor or sponsor pays to appear in Lakehouse100. Marketing budget does not interact with editorial.
Generic data engineering profiles
Strong generic data engineering experience without specific Databricks-first evidence is not sufficient on its own.
Non-UK practitioners
The H1 2026 edition is UK-only. Practitioners based outside the UK are reviewed for future regional editions.
What we publish — and what we never do.
Public profiles appear only after consent and admin approval. The published profile is a strict subset of what we hold.
+Shown publicly
- Name name
- Tier badge tier
- Approved citation citation
- Display title display_title
- Current company current_company
- LinkedIn URL (confirmed) linkedin
- Approved public links links[]
- Photo, only if consented photo*
- Capability segments segments[]
Twice a year. Same standard each time.
Lakehouse100 refreshes every six months. Every edition runs the same pipeline — longlist, score, invite, publish — so movement between editions is meaningful, and the bar stays stable.