Every Company Builds the Same Broken Customer Data Architecture
I can tell how old a company is by looking at how they move customer data between tools. Not because the tools change, but because the approach follows the same arc almost every time.
A two-person startup has one person copy-pasting between Stripe and a spreadsheet. A ten-person team has a cron job somebody wrote in a weekend. A fifty-person company has three cron jobs, two Zapier workflows, a "sync script" that nobody wants to touch, and a Slack channel called #data-issues where people report when things look wrong. I've watched this happen at probably a dozen companies over the past eight years, and the progression is weirdly consistent.
The interesting part isn't that it happens. It's that each stage feels like a reasonable decision at the time, and the accumulated result is a system that nobody designed and nobody can explain.
The Spreadsheet Export Phase Is Longer Than Anyone Admits
Most companies spend six to eighteen months in a state where someone regularly exports a CSV from one tool and imports it into another. Marketing exports contacts from HubSpot, enriches them in a spreadsheet, uploads them to Mailchimp. Finance downloads Stripe charges and reconciles them against the CRM by hand.
This works. Genuinely. For small teams, manual data movement has some real advantages: you see every record, you can fix issues on the fly, and you understand the shape of your data because you're literally looking at it in a spreadsheet. The problem isn't that it's manual. The problem is that it trains the organization to think of data movement as someone's side task rather than infrastructure.
When the volume gets uncomfortable, the person doing the exports doesn't usually raise a flag. They automate their own workflow quietly, which brings us to the next phase.
Scripts and Cron Jobs Feel Like Progress
The first integration script is almost always written by whoever got tired of the CSV exports. It pulls from one API, transforms a few fields, pushes to another. Maybe it runs on a cron schedule, maybe it's a Lambda function, maybe it's a GitHub Action that someone triggers manually.
These scripts solve the immediate problem. But they create three new ones that don't show up for months.
First, nobody monitors them. The script runs on a schedule, and when it fails silently (rate limit, expired token, schema change), data just stops flowing. Nobody notices until someone in sales says "why hasn't this contact been updated in two weeks?"
Second, each script handles exactly one direction of one integration. Stripe to HubSpot is one script. HubSpot to Intercom is another. Intercom to Postgres is a third. Each one was written by a different person, at a different time, with different assumptions about error handling and field mapping. At twenty integrations, you have twenty scripts with no shared logic.
Third, and this is the one that really gets you: these scripts don't handle conflicts. If a record gets updated in both systems between sync runs, the script just overwrites one with the other. Whichever ran last wins. Nobody tracks what the old value was or why it changed.
I've seen a company with 40 employees running 14 separate sync scripts maintained by 6 different engineers, none of whom had documentation for each other's work. When one engineer left, three integrations broke within a month because nobody knew the deployment credentials.
The Zapier Detour
At some point, someone on the ops team discovers Zapier or Make. Finally, non-engineers can build integrations without filing a Jira ticket. The initial reaction is relief. The medium-term result is a different kind of mess.
Zapier is great for event-triggered workflows. Something happens, Zapier does a thing. Where it struggles is ongoing synchronization. There's no concept of "keep these two datasets in sync." You can trigger on new records and updates, but there's no backfill, no diff tracking, and no way to handle a situation where 500 records in system A need to be reconciled against 480 records in system B.
Teams end up building increasingly complex multi-step Zaps to approximate sync behavior. A five-step Zap that checks if a record exists, updates it if so, creates it if not, logs the result to a spreadsheet, and sends a Slack notification on failure. Multiply that by fifteen integrations and you have an automation platform doing a job it wasn't built for.
The per-task pricing also starts to bite at scale. I know a 30-person company paying $400/month for Zapier, most of it going to high-volume sync Zaps that run every few minutes. That's not unreasonable as a line item, but the underlying problem is architectural: you're paying ongoing costs for a workaround because the real issue (your tools don't share data natively) was never addressed.
What Actually Works and Why Teams Resist It
The pattern I described above has a name in the industry. It's the progression from manual processes to point-to-point integrations to workflow automation, and the exit ramp is usually a customer data platform that treats sync as infrastructure rather than a series of individual connections.
The resistance is understandable, though. Most teams associate that category with enterprise software that costs six figures and takes months to implement. And honestly, until recently, that association was mostly accurate. The tooling was built for companies with dedicated data engineering teams and existing warehouse infrastructure.
That's changed. There are options now that let a single engineer or ops person set up bidirectional sync between tools in an afternoon, handle the backfill and ongoing updates in one configuration, and actually track what changed and why at the field level. The "build vs. buy" calculation has shifted because the "buy" side no longer requires a procurement process and a six-month implementation.
But here's what I think people miss about this decision. It's not really about saving engineering time, even though it does. It's about making customer data someone's actual responsibility instead of everyone's side project.
When data flows through a central system with visibility into what synced, what failed, and what changed, someone can actually own it. When data flows through 14 scripts and 8 Zaps, nobody owns it, and the default behavior when something breaks is to route around the problem rather than fix it.
The Ownership Problem Is the Real Problem
I've spent most of this article talking about technical architecture, but the actual failure mode is organizational. The fragmented architecture persists because no single person or team is responsible for customer data flowing correctly across the stack.
Engineering owns the scripts but doesn't use the tools they sync to. Marketing owns HubSpot but can't debug why contacts aren't syncing. Sales owns the CRM but doesn't know that a Zapier workflow is overwriting their manual updates every fifteen minutes.
Picking the right tooling matters, but it matters less than picking an owner. Someone who can see every integration, knows what data flows where, and has the authority to say "we're consolidating these six sync mechanisms into one system." Without that person, you'll replace your current mess with a slightly more modern mess and be back in the same spot in eighteen months.
The technical decision is straightforward once the organizational one is made. But most companies keep trying to solve an ownership problem with engineering solutions, and that's why the cycle repeats.
|
|