How to Make Successful Teams Last (or at least try to)

Side projects ftw

Key Takeaways

  1. Take care of your team, and your team will take care of you.
    My role has been closer to a coach than a commander. The more I invested in the team’s well being and clarity, the more they owned the product and exceeded expectations.

  2. Challenge the status quo relentlessly.
    Efficiency is not about hitting KPIs blindly. It is about constantly asking: could we do this better? That curiosity must be part of the team DNA.

  3. Do not apply frameworks by the book.
    Team Topologies gave us a language and patterns, not a script. We adapted, experimented, failed, and iterated. Context always wins.

  4. If you build platforms, invest in your top users.
    Platform engineering is about relationships as much as it is about capabilities. Know your top three users, talk to them regularly, and evolve together.


Introduction: why lasting teams matter

Delivering a big project feels glorious. Making that success durable is harder and more interesting. When the migration of our supply chain orchestrator went live in March 2023 it was a proud moment for everyone involved. The truth is that a delivered project is only the beginning. The real test is to keep delivering, to keep learning, and to keep people excited about the work for years after the spotlight fades.

This post is a detailed account of how we tried to do that. It is not a recipe. It is a lived story that mixes structure, rituals, people practices, platform relationships, SRE habits, and long term thinking. It includes the good decisions and the rough patches. It pulls together everything I have shared with you so far.


Context and the starting point

When I arrived at the product in November 2022 the core team counted about 25 people split between France and India. Business stakeholders and IT engineers worked closely, but our support organisation was separate from the product team. The product itself was not a user facing UI. It was a system to system orchestrator built on an event driven architecture with Kafka and Kafka Streams and deployed as microservices on Kubernetes. Over time the estate grew to 300+ microservices, just to give you a spin.

Operationally the team had been organised into two main squads:

  • Migration: dedicated to replatforming the old orchestrator into a tailor made microservice architecture.
  • Contributions: responsible for all new features so the larger transformation program would not stall.

This configuration helped us ship the migration and cutover. By March 2023 we had completed the intercontinental migration, a huge accomplishment. Ater that GoLive and self-reflection from team members, that success revealed the limits of the setup. Daily stand ups were noisy and unfocused. People in the same meeting often had no connection to the item being discussed. Cognitive load was high. Two scrum masters were present but their missions were unclear. One of them was about to leave. The team needed an organisation fit for the long run, not only for a migration project.

At the same time we had a remarkable transverse Solution Consultant. He worked across the product like a quasi product owner. He had an exceptional end to end view and a functional appetite. He could disassemble the product and reconstruct the business flows into meaningful solutions. His presence gave coherence. After several months he moved on to a different gig. The team then faced a choice: replace him with one person or adopt his functions collectively. The team chose to own that vision collectively. That decision was a turning point in how ownership and knowledge would be distributed.

On the technical side we also benefited from internal moves that allowed us to appoint clear Tech Leads per vertical. In early 2025 we introduced a transverse Tech Lead role which the team is still experimenting with. These changes show how organic organisation can be and must be ready to evolve.


The Cantal bootcamp: celebration and co-creation

Right after the Go Live we took the team, business partners and support for several days together in the Cantal. The first objective was to celebrate. The second objective was to work on the future. We did not come with a blueprint. Instead we asked a single, uncomfortable question:

What should our organisation look like tomorrow?

Mixed groups of IT, product and support sketched processes and mapped pain. They highlighted problems we already suspected: lack of listening across domains, poor end to end visibility, impermeable topic boundaries and excessive cognitive load. Those conversations created ownership. The point of the exercise was not only to design but to include people in the design so adoption would not be a management imposition. The outcome was a strong preference for organising around business value streams.


The redesign: verticals, triptych roles and evolving tech leadership

From the bootcamp emerged a structure aligned to our business flows. We organised into three business verticals:

  • Customer Order
  • Intrazone
  • Intercontinental

We later added a Technical Vertical to focus on integration, developer tooling, resilience and platform facing work. Each business vertical was staffed around a triptych:

  • Process Engineer - the business anchor, focused on process and domain.
  • Tech Lead - the technical anchor for the vertical. Over time each vertical got its own Tech Lead.
  • Solution Consultant - the bridge between business and tech. This role, when it existed as a transverse profile, had great impact. When our transverse Solution Consultant left, the team decided collectively to take on the end to end vision rather than simply replace him.

This arrangement mirrored Team Topologies patterns while remaining pragmatic. We also embraced role fluidity. Early 2025 we introduced a transverse Tech Lead role to try to provide technical continuity across verticals. The team continues to evaluate that role and adapt its scope.

We kept at least one Scrum Master in a transverse fashion. That Scrum Master did not drown in delivery tasks. Instead the role was deliberately slightly external to day to day execution so they could help surface impediments, push for improvements and coach teams to lift their heads from the weeds. This floating Scrum Master has been important to preserve space for reflection and continuous improvement, alongside with a Projet Manager who was previously Solution Consultant. Closing the loop!

One practical lesson was to avoid freezing roles. When people move, the team must have the capacity to reassign responsibilities, to grow new leaders and to rewire itself.


Rituals and cadence: the long path to three daily checkpoints

Rituals are the plumbing of team life. We iterated on them heavily.

Initially we had vertical dailies from 9:00 to 9:30 and a transverse ambassadors meeting from 9:30 to 10:00. The Technical Vertical did not fit well in this pattern. For nearly two years we were unsure where to place the technical coordination. That uncertainty created friction and occasional delays.

We evolved to a three slot morning rhythm:

  • 9:00 to 9:20 - vertical stand ups. Each vertical meets and discusses immediate work and blockers.
  • 9:20 to 9:40 - technical Ops daily. Ops referents and technical relays gather to align on deployments, incidents and cross vertical technical needs.
  • 9:40 to 10:00 - transverse ambassadors sync. One ambassador per vertical reports progress, risks and coordination points.

This layered “rhythm” achieved several things. It preserved vertical autonomy and focus. It provided a dedicated forum for cross cutting technical work and it kept a short, predictable window for product level coordination. An important cultural rule supported this cadence: if you do not bring value to a meeting you are allowed not to attend. There is no shame in skipping sessions that are not relevant. That rule, inspired alongside with the Two Pizza Team principle (2 pizzas should be enough to feed the people in the meeting), has helped keep engagement high.

Shared on-call rotations and the rule that every developer must be able to deploy to production reinforced ownership and operational awareness. To support the technical backlog we named two Ops referents and created relays inside each vertical: developers who are primarily assigned to a vertical but who also act as technical liaison into central Ops work.


Testing: from heavy manual to strategic automation

At the start almost all testing was manual. Process engineers, solution consultants and developers ran long manual scenarios. That approach helped validate the migration but it did not scale. Repeated manual tests are expensive, slow and fragile.

After the migration we invested in a testing strategy. The team explored ways to automate repetitive scenarios, to add integration and contract tests in CI, and to make testing a lever for safe velocity. This work is ongoing, technical and sometimes tedious but it is also essential for long term maintainability. The team made early experiments around test harnesses and scenario automation and started to convert high value manual sequences into automated pipelines.

Part of the ambition is to make parts of this work open source so others can reuse our patterns. The testing shift is not just about speed. It is about being able to change the system with confidence and to reduce the cognitive cost of each change.


From project ops to shared operations: onboarding level 2 and operational parity

When I took the role the project team operated everything. Level 2 had little visibility and limited ownership. That situation was unsustainable. The product touched new technologies and had a high operational surface. If the project team had to keep operating it alone, burnout and fragility were certain.

We decided to make operations a shared responsibility. The steps were deliberate:

  1. Automate monitoring and alerting so the system reports itself.
  2. Convert incidents and alerts into clear, actionable knowledge base articles. Each incident created or improved documentation.
  3. Use those KBs and runbooks to train and onboard level 2 support.
  4. Reach operational parity. Today level 2 colleagues participate in on-call rotations and handle incidents directly, lowering stress and making the system more resilient.

Two years after the migration this handover has made a difference. The support organisation now acts as a first responder with real knowledge and escalation rules. The product is supported by a layered operation rather than by a single project team.


SRE practices and post-mortems: creating an operational memory

From July 2023 we introduced site reliability engineering practices more formally. The habit that had the biggest impact was writing post-mortems for everything abnormal. That includes production outages and also pre-production errors and dev environment surprises, called internally “oopsies”.

All post-mortems live in GitLab and are saved like code. New joiners can clone the repo, search incidents and learn from real events. The purpose of post-mortems was never blame. It was collective learning and improved system design. We reviewed post-mortems together with business, level 2, level 3 and platform teams to make sure we addressed root causes and improved fault tolerance.

In 2025 this corpus of operational knowledge is also the raw material to build tooling and to teach AI helpers that accelerate onboarding and troubleshooting. That capability is a long term asset.


Master topologies: teaching operations without code

To help support and newcomers we created master topologies: standardised visual maps that describe how each business process flows through the system and which components are involved.

Master topologies let level 2 support reason about incidents without reading code. They can trace an order end to end, identify where a failure might surface and understand the expected behaviour. In NPS surveys support consistently said that master topologies were a game changer for their autonomy.

Documentation became a strategic asset rather than an afterthought. If you want to know more, have a look here: https://blogit.michelin.io/dkafka-streams/


Platform engineering: from push to partnership

Our product depends on platform teams for many critical capabilities:

  • GitLab for CI/CD and pipelines.
  • Kafka and Confluent for streaming.
  • Kubernetes clusters for orchestration.

Initially the relationship with platform teams was transactional: they built, we consumed. Over time we realised that real sustainability required partnership. We established monthly one to one checkpoints with each platform team. These sessions have no heavy agenda. They are a place to share roadmaps, announce scaling needs and align on priorities.

The cultural shift mattered. Platform teams stopped being a black box and became co-designers. They participated in DR exercises with us and helped tune defaults to our workload. That partnership also led to contributions back and forth. For example part of our Kafka work matured into KStreamplify, an inner source project that we open sourced and that subsequently drew external attention. That kind of collaboration benefits both sides.

My golden rule for platform engineering is simple: know your top three users and keep a regular touchpoint with each of them.


Feedback, vulnerability and team health

We learned that feedback is not something you do only when things are rosy. It is more important when tensions are rising and when people feel hurt. Truthful feedback requires vulnerability and courage. In a team of 25 people there are different temperaments and moments where some voices can overwhelm others.

We worked to normalise candid but constructive feedback. We used short NPS style pulses to detect frustrations early. We created safe spaces in retros and in 1 on 1s for people to surface small issues before they turned into big problems. Removing the small pebbles one by one prevented the pressure cooker effect.

One small practical rule helped: permission to skip meetings that do not add value. That rule, combined with the Two Pizza Team mindset, reduced meeting fatigue and signalled trust in people’s judgement.


Results and recognition

The combined effect of structure, rituals, SRE practice, platform partnership and people centric work shows up in both metrics and human feedback:

  • DORA metrics are strong. Lead time, MTTR, deployment frequency and change failure rate all improved.
  • Deployment autonomy is high. Multiple verticals can move to production independently, typically every three days or less.
  • Team engagement remains high. Two Pizza Team rules, permissive attendance and meaningful retros keep energy up.
  • Support maturity increased. Level 2 now runs on-call and handles incidents thanks to documentation and training.
  • Testing strategy advanced. The team moved from manual heavy processes to targeted automation and integration testing.
  • Roles adapted successfully. The team created vertical Tech Leads, experimented with a transverse Tech Lead role in early 2025 and kept the Scrum Master as a transverse facilitator, including our Project Manager to handle all the contributions.
  • Feedback culture matured. People learned to be vulnerable and give constructive feedback.
  • External recognition came through internal requests for guidance and through open source contributions like KStreamplify.

All of this meant the product was not only stable but also evolving, with the team staying engaged and proud of its work.


Scalability, knowledge and the future

We reached a practical sweet spot at around 30 people. That size lets us cover complexity while staying agile. Beyond that number we expect more latency and a higher risk of silos. If new business processes require more capacity the right answer will often be to create new verticals rather than to inflate existing ones.

Knowledge management is the next battleground. Onboarding used to take months and sometimes a year. Today documentation in GitLab, master topologies and a growing set of automated checks shorten that ramp. We also experimented with an AI assistant that can answer onboarding questions by sourcing the post-mortem corpus and the KB. That work is early but promising.

Keep in mind that organisation is living. The transverse Solution Consultant and the transverse Tech Lead roles were both experiments in distributing vision and continuity. Some experiments last, others morph. The point is to keep evaluating and to keep the team involved in redesign.


Lessons learned

If you want the short list of things that helped us make success last:

  • Care for the team first. Leadership is service.
  • Keep testing and automation strategic. Manual testing is debt if it becomes the long term norm.
  • Co-create the organisation. People adopt what they helped design.
  • Treat documentation as a product. Post-mortems, master topologies and runbooks are compounding assets.
  • Invest in platform relationships. Platforms succeed when they know and listen to their users.
  • Give permission to skip meetings that do not add value. Two Pizza Team is also a rule for attendance.
  • Encourage vulnerability and feedback. Prevent small problems from growing into crises.
  • Let roles evolve. A team is an organism. Roles must be able to change with context.

Conclusion

Making successful teams last is less about finding the perfect org chart and more about maintaining intentional practices: co-creation, clear rituals, shared operations, partnership with platforms and relentless care for people. It takes humility to accept that structures must change, discipline to invest in tests and documentation and courage to have the difficult conversations.

If there is one sentence I keep returning to it is this: care for the team and the team will carry you further than you imagined. That is how we try to make success not a one time event but a durable capability.

To that team, if you folks read until here: Thank you and I wish you the best.