Tuesday, January 11, 2022

SRE is not scalable.

SRE is not scalable. Hiring dedicated engineers to fix scalability concerns is not scalable.

The constant tug-of-war with SWE & SRE is tiresome.  It drains energy.  It demotivates masses of people on both ends of the rope.  It kills "the organization".

It does not have to be this way.  SWE & SRE are ideally partners-in-crime.  They help each other in dire situations, in long term visions, and the missions to accomplish it all.  They learn and grow from each other; their dimensions are unioned to manifest in something much greater.

So why is there this perpetual game?

  • SWE pleases the needs of X with software.
  • SRE pleases the needs of X with reliable software.
  • X can be users, customers, the market, whatever.

This differentiation is the root of the conflict.  Once you, yes you the reader (or perhaps your organization), intend on hiring a dedicated engineer to fix someone else's problem, you have lost the forest for the trees.

You are losing accountability.  You are robbing growth opportunities.  You are making it amply clear that unreliable software can be fixed by hiring more people.  A management solution to an engineering problem.   The core responsibility of SWEs is to write (and perhaps wrote) the software in the first place.  The software to help your customers.  The same software to reliably help your customers.

How do you fix this?

Leadership, across all levels, need to agree that short-sighted feature delivery is not the priority.  Long-term customer value is the priority.  What produces long-term customer value? Beneficial features, performance, security, privacy, and many more, consistently ... reliably.  Furthermore, agreement is not enough.  It needs to be encouraged and rewarded through incentive structures like more ownership, more compensation, and more career growth (e.g. promotions).

The opposite however, like writing unreliable software, shall not be punished.  It is neutral at worst, and slightly positive if taken as a growth opportunity to write reliable software in the future.  The last environment you want to accidentally foster is one of fear, killing all creativity and trust in "the organization".

Hiring an SRE is not scalable.  Teach your engineers to write reliable software instead, and teach your leadership to inspire the same.  SREs are supposed to evangelize such practices and help develop people's expertise in reliability.  SREs are supposed to work on core reliability fundamentals.  They are not supposed to solely fix other people's problems.

Cascading Data Quality

Sometimes data flows publish datasets with enough discrepancies that they cause, often silent and non-obvious, downstream impact. Imagine a ...