A Series B fintech we worked with last year burned $280,000 on a data engineering hire who quit after four months. The salary was $165K. The recruiter fees were $33K. The real cost was the six months of pipeline work that had to be redone because they'd hired someone who could pass a LeetCode screen but couldn't design a system.
This happens constantly. Companies post for a Data Engineer when they need an Analytics Engineer. They screen for tool proficiency when they should screen for problem-solving. They run candidates through five rounds of interviews designed to make hiring managers feel thorough—not to surface good candidates.
This guide is designed to fix that. But we'll be honest upfront: sometimes hiring isn't the answer at all.
What You'll Learn
- •
- How to diagnose which role you actually need (and when you don't need to hire)
- •
- What to evaluate that predicts success—and what looks impressive but doesn't matter
- •
- A process that's rigorous without wasting everyone's time
- •
- When to hire vs. bring in outside help (we'll be honest about this)
The Uncomfortable Truth About Data Hiring
Most companies shouldn't be hiring their first data engineer.
There. We said it.
If you're a Series A company with fewer than 50 employees, your "data infrastructure problem" is probably a "we don't know what questions to ask yet" problem. Hiring a senior Data Engineer to build pipelines before you know what data matters is like hiring an architect before you've decided what city to live in.
We've seen this pattern repeatedly: Company raises Series A, decides they need "data infrastructure," hires a $180K Data Engineer, and six months later that engineer is either bored (because there's not enough infrastructure work) or frustrated (because nobody can tell them what to build). Meanwhile, the company still can't answer basic questions about customer retention.
The contrarian move: Start with an Analytics Engineer or a strong Analyst. Someone who can answer business questions with the data you have, and who can tell you what infrastructure you actually need. Build the plumbing after you know where the water needs to go.
Before You Interview: Getting the Role Right
The data field has a naming problem. "Data Engineer," "Analytics Engineer," and "Data Analyst" mean different things at different companies. Before you write a job description, get clear on what you're actually solving for.
The diagnostic question: What's the bottleneck?
"We have data, but we can't answer basic questions about the business."
You need a Data Analyst. Someone who can explore datasets, identify patterns, build dashboards, and translate findings for stakeholders. They think in business questions, not technical architecture.
"We have questions, but our data is a mess—inconsistent definitions, duplicated logic, models nobody trusts."
You need an Analytics Engineer. Someone who builds clean, tested, documented data models. They bridge raw data and business-ready datasets. They think in data contracts, testing, and maintainability.
"We have analytics, but our systems are unreliable—pipelines break, data arrives late, we can't scale."
You need a Data Engineer. Someone who builds and maintains the infrastructure that moves data reliably. They think in throughput, latency, fault tolerance, and orchestration.
Where Companies Go Wrong
Hiring a Data Engineer when you need an Analytics Engineer. This is the most common mistake we see. You have data landing in your warehouse, but it's a mess—no consistent naming, business logic scattered across dashboards, nobody agrees on how to calculate revenue. You post for a Data Engineer because "we need someone technical." But a Data Engineer's instinct is to build infrastructure, not to model data for analytics. You end up with better pipelines feeding the same messy tables.
What we learned the hard way: At a client last year, we inherited a data stack built by a talented DE who'd spent 8 months building a beautiful, over-engineered pipeline. Real-time streaming, perfect idempotency, the works. Problem: the business still couldn't calculate customer LTV consistently because nobody had built the models. They didn't need real-time data—they needed a revenue definition everyone agreed on.
Hiring an Analyst when you need an Analytics Engineer. Your analyst is spending 80% of their time cleaning data and 20% analyzing it. That's a sign your modeling layer is broken. Analysts should explore and interpret data, not fight with it. If your best analyst is writing the same SQL transformations repeatedly, you need someone to build the foundation they can stand on.
Hiring for seniority you don't need. A senior Data Engineer at a Series A startup will be frustrated if there's no infrastructure to architect. Match the role to your actual stage—sometimes you need someone excited to build from scratch, not someone who's managed teams of 10.
The Overlap Is Real
These roles share core skills—SQL, data modeling concepts, analytical thinking. A strong Analytics Engineer can do analyst work. A strong Data Engineer understands data modeling. The boundaries are blurry by design.
When hiring, focus less on title and more on the actual work. Write job descriptions around problems to solve, not taxonomies to fit.
What to Evaluate
Stop asking SQL questions in interviews.
Okay, that's a slight exaggeration. But here's what we mean: most SQL assessments test syntax recall, not SQL thinking. Can they remember the exact syntax for a window function? Who cares—they'll look it up. Can they explain when a window function is the right tool and walk you through their reasoning? That matters.
With AI tools accelerating, the half-life of specific tool knowledge is shrinking. Someone who deeply understands dimensional modeling can learn dbt in a week. Three years of dbt on a resume means nothing if they can't explain what a surrogate key is or when you'd use one.
We recommend evaluating in two layers:
Layer 1: Fundamentals (Non-Negotiable)
These skills transfer across tools, stacks, and decades. If a candidate is weak here, no amount of tool proficiency will compensate.
SQL fluency. Not syntax—can they think in sets? Do they understand how joins actually work, when to use window functions, how to debug a query returning unexpected results? SQL is the lingua franca of data work. Weak SQL thinking is a disqualifier for all three roles.
Data modeling intuition. Do they understand why you'd normalize data? When you wouldn't? Can they look at a business process and sketch out entities and relationships? This doesn't require star schema terminology—it requires thinking clearly about how data represents reality.
Systems thinking. Can they reason about how components interact? For an Analyst, understanding how upstream issues affect their dashboard. For an Engineer, how a pipeline change affects downstream consumers. The scale differs, but the thinking pattern is the same.
Problem decomposition. Given an ambiguous question, can they break it into tractable pieces? This separates someone who gets stuck on "analyze our sales data" from someone who asks clarifying questions, identifies what data they'd need, and proposes a first step.
Layer 2: Stack Proficiency (Context-Dependent)
These skills matter, but they're learnable if Layer 1 is solid. Weight them based on your team's capacity to ramp someone up.
Tool experience. Have they used your stack (Snowflake, dbt, Fivetran)? Direct experience reduces ramp time, but it's not predictive of long-term performance. A strong candidate with no dbt experience will outperform a weak candidate with three years of dbt.
Domain familiarity. Have they worked in your industry? Helpful for context, but often overweighted. A sharp generalist will learn your domain faster than a domain expert will learn to think clearly.
The "Opinions Backed by Reasoning" Test
The best signal isn't whether someone knows the "right" answer—it's whether they can articulate tradeoffs. "It depends" is fine if followed by "here's what it depends on."
Ask "why" more than "how." Ask "what would change your mind?" Ask "what are the downsides?" You're probing for depth, not memorized best practices.
A candidate who says "I'd use a star schema because that's the standard" is less impressive than one who says "I'd probably start with a star schema because our queries are mostly analytical aggregations, but if we had a lot of ad-hoc exploratory queries I might keep things more normalized to stay flexible."
Designing the Process
Most interview processes are designed to make hiring managers feel thorough. They're not designed to surface good candidates.
Five rounds of interviews doesn't mean you're being rigorous—it means you're being slow. The best candidates have options. If your process takes six weeks, you'll lose them to a company that moved in two.
Recommended Stages
1. Resume Screen (5-10 minutes) Relevant experience, progression, clarity of communication. Don't over-index on pedigree.
2. Hiring Manager Screen (30 minutes) Mutual fit check. Explain the role, understand their goals. Light technical probing—filter for obvious mismatches.
3. Live SQL Exercise (45-60 minutes) This is where you see how candidates actually think. Not a syntax test—a collaborative problem-solving session. More on this below.
4. Take-Home Exercise (2-3 hours max) A realistic work sample. Cap it strictly—anything longer selects for free time, not talent.
5. Final Round (2-3 hours total) Deeper technical discussion. Culture conversations. This is also the candidate's chance to evaluate you.
The Live SQL Exercise: How We Do It
We're big believers in live SQL exercises—not as a gotcha, but as a window into how someone thinks and communicates. Done right, it's the most signal-dense hour in your process.
The setup:
- •
- Use a realistic dataset—ideally anonymized data from your actual business, or something similar
- •
- Start with a deliberately ambiguous question: "Help me understand customer retention" not "Write a query that calculates 30-day retention rate"
- •
- Give them access to a SQL editor and let them query live. Screen share so you can see their process.
- •
- Make it collaborative. You're a teammate, not an examiner.
What you're watching for:
- •
- Do they ask clarifying questions before writing code? (They should.)
- •
- How do they explore unfamiliar data? Do they check row counts, look at distributions, examine edge cases?
- •
- Can they explain their reasoning as they go? "I'm joining these tables because..." is more important than perfect syntax.
- •
- When they get stuck or make a mistake, how do they debug? Do they stay calm and systematic?
- •
- Can they take feedback and adjust? If you suggest a different approach, do they engage with it or get defensive?
What you're NOT evaluating:
- •
- Syntax memorization. Let them Google. Let them use autocomplete. Who cares.
- •
- Speed. Thoughtful and slow beats fast and wrong.
- •
- Getting the "right answer." The process matters more than the output.
The best live SQL sessions feel like pair programming, not an exam. If the candidate is nervous and silent, you're doing it wrong. If you're learning something about your own data from their questions, you're doing it right.
We've included a sample exercise and evaluation rubric in the resources below.
What to Assess at Each Stage
| Stage | Primary Focus |
|---|---|
| Resume Screen | Relevance, trajectory |
| Initial Screen | Mutual fit, communication |
| Live SQL Exercise | Thinking process, communication, debugging approach |
| Take-Home | Execution quality, code/analysis craft |
| Final Round | Depth, collaboration style, culture fit |
Time-Respectful Practices
Provide context. Share evaluation criteria upfront. Let candidates know what success looks like. Don't make them guess.
Move quickly. Aim for two weeks from first screen to offer. Every day you delay, someone else is closing.
Give feedback. Even rejections deserve a reason. It's decent, and it builds goodwill in a small industry.
The Seniority Gradient
Junior, mid, and senior aren't about years—they're about scope and autonomy.
Junior: Executes well-defined tasks with guidance. Learning velocity matters more than current knowledge.
Mid: Breaks down ambiguous problems independently. Knows when to ask for help versus push through.
Senior: Identifies the right problems to solve. Makes others more effective. Operates with minimal oversight.
Evaluating Differently by Level
| Dimension | Junior | Mid | Senior |
|---|---|---|---|
| Technical execution | High | High | Medium |
| Problem decomposition | Medium | High | High |
| Communication | Medium | Medium | High |
| Independence | Low | High | High |
| Influence/mentorship | Low | Low | High |
Evaluating Candidates
Red Flags (Any Role)
Can't explain their own work. If they can't walk you through a resume project, either they didn't do it or they can't communicate about it. Both are problems.
Blames others for failures. "The data team gave us bad requirements" is a red flag. "We didn't align on requirements early enough—here's how I'd do it differently" is a green flag.
No questions for you. Good candidates are evaluating you too. No questions means they're not thinking critically about the role.
Tool-focused without fundamentals. "I'm a dbt expert" means nothing if they can't explain what a surrogate key is.
Overconfident on everything. Nobody is an expert at everything. Candidates who can't say "I don't know" are either lacking self-awareness or not being honest.
Green Flags (Any Role)
Asks clarifying questions. They make sure they understand what they're solving before they start solving.
Explains tradeoffs. Doesn't just give an answer—explains alternatives considered and why this approach was chosen.
Owns their gaps. "I haven't used Snowflake specifically, but I've used BigQuery and expect the concepts to transfer. Here's what I'd need to learn."
Clear communication. Can explain technical concepts without jargon. Adjusts depth based on audience.
What This Looks Like in Practice
A healthcare analytics company came to us after two failed Analytics Engineer hires. Both had strong resumes—one from a FAANG, one from a well-known data consultancy. Both washed out within six months.
When we reviewed their interview process, the problem was obvious: they were screening for impressive backgrounds and dbt syntax, not for the actual job. Their data was messy, their stakeholders were non-technical, and they needed someone who could operate in ambiguity.
We helped them redesign the process:
- •
- Replaced the SQL syntax test with a live SQL exercise using their actual (anonymized) data
- •
- Added a stakeholder simulation—explain a data discrepancy to a non-technical PM
- •
- Cut total interview time from 12 hours to 6 hours across fewer rounds
Their next hire stayed 2+ years and built the modeling layer that their analysts still use today. The difference wasn't finding a "better" candidate—it was evaluating for what actually mattered.
When to Hire vs. Bring in Outside Help
We're a consulting firm, so take this with appropriate skepticism. But we've seen enough hiring decisions to know that sometimes the answer isn't "hire someone."
Consider hiring when:
- •
- You have ongoing, predictable work that justifies a full-time role
- •
- The work requires deep institutional knowledge that takes years to build
- •
- You're building a team and need someone to grow with you
- •
- You have the management capacity to onboard and develop someone
Consider outside help when:
- •
- You need to move fast and can't wait 3-6 months for a hire to ramp
- •
- The work is project-based with a clear endpoint
- •
- You're not sure what you need yet and want to figure it out before committing to a hire
- •
- You need senior expertise but don't have ongoing senior-level work
- •
- Your current team needs to be unblocked, not replaced
The honest answer is often "both"—bring in outside help to unblock the immediate problem while you hire for the long term. We've done this with several clients: embedded with their team for 3-6 months while they hired, then transitioned knowledge to the new person.
Resources
Practical tools to implement what's covered here:
Role-Specific Evaluation Guides
- •
- Data Analyst Cheatsheet — Interview questions and evaluation criteria
- •
- Analytics Engineer Cheatsheet — Technical assessment guide
- •
- Data Engineer Cheatsheet — Infrastructure and systems evaluation
Interview Tools
- •
- Interview Panel Guide — What to probe, how to score, how to take notes
- •
- Candidate Scoring Framework — Independent scoring template with weighted rollups
- •
- Live SQL Exercise Kit — Sample datasets, prompts, and evaluation rubric for running effective live SQL sessions
- •
- Take-Home Exercises — Exercises for all three roles with rubrics
———
Struggling With a Data Hire?
We offer a focused hiring strategy session: 60 minutes to review your current role, process, and pipeline. You'll leave with specific recommendations—whether that's refining your job description, restructuring your interview process, or reconsidering whether hiring is the right move at all.
Book a hiring strategy call →