During my 25-year career at Microsoft, I had the pleasure of working in Test Management roles for a variety of products and services. Each time I joined a team, I evaluated the quality assurance approaches to understand where we could improve. In this blog post, I’d like to walk you through my approach and offer some insights and guidance. It all started with the three P’s approach (Product, Processes, and People).
First, I needed to learn about my new product. I wanted to understand the product architecture, our customer base, and the reputation of our product for quality including any memorable issues. I would make a list of what versions of our product we were supporting. What devices and configurations we ran on? I learned about our product roadmap as well as a list of key features we were pushing to develop.
Next, I wanted to understand the process for feature development. I reviewed the schedule of major/minor release cadence and the code branching structure to facilitate it. I worked to understand the engineering processes and tools used to ensure quality including the current level of commitment to test automation. I wanted to understand how we rolled out new features and what approach we took to ensure overall product quality.
Finally, I focused on people. I wanted to understand the current roles and responsibilities. I wanted to know what the developer team to test team size ratio was and how collaborative a relationship existed. I wanted to understand how Dev-Ops was handled and how many resources it used. I asked about specific concerns from team members. I wanted to know what worked well and what was a problem, I needed to understand better.
I then asked myself the following three questions.
- What needs to improve so we can increase our QA agility?
- How can we increase our QA capacity?
- How can we increase our QA consistency?
Each product had unique challenges but there were often similarities. Upper management always pressured the team to produce more features, faster, while maintaining quality. Most development teams struggled to complete a laundry list of features. When development ran late, as it often did, the testing organization was squeezed into schedule compression to try and still make the ship date. My approach to improvements was often predicated on three areas.
- Become more efficient by making smart investments in test automation.
- Work better together by tightening the Dev-Test collaborations.
- Control Dev-Ops costs by being smart when rolling out new features.
With Dev and Test both overloaded, I needed to provide some relief. I could try to ask for more headcount, but I usually didn’t for several reasons. First, headcount is usually allocated on a yearly basis and built into staffing models. Trying to one-off get additional headcount has a low probability for success. I also learned that adding additional people to an inefficient engineering process does not give you full value. You are better off adding engineers to a well-functioning team that can utilize them more efficiently. My goal was to develop an efficient team and then next year, assess the need for more headcount.
Secondly, headcount comes with a long lead time. It may take months to hire and even then, it may take months to train and develop. Along the way, you tax the bandwidth of the existing team since now they also must participate in interview loops and training programs. So, adding headcount is not the first answer.
Getting product spend dollars is much easier. There is usually some product budget that has gone unused and can be repurposed so my goal was to make a return on investment proposition to get product spend dollars approved which I could then use to provide engineering relief. There are three main return on investment arguments (ROI).
1. I can save money argument
The premise of this argument is by giving me some immediate money, I will become more efficient and then be able to cut budget next year to more than make up for it. Usually, this means you forward promise to dramatically reduce some future vendor spend, some future lab hardware spend, or even reduce overall future headcount. Upper management will ask for a number so you may need to state something like I will be able to reduce our lab spend or vendor spend by a certain dollar amount. They will remember the number and your promise come next year’s budget time.
2. We can drive an increase in revenue argument
This argument proposes that a small product spend now will enable us to add customers, upsell current customers, or drive market share. Likewise, upper management will want a number so you should have some guesstimate metrics like we will grow our monthly active users by 20%, we will grow our per transaction dollar amount by 20%, and/or, we will increase our mobile user base by 20%. They will understand these numbers and hold you accountable to at least meet most of them.
3. FUD argument (Fear, Uncertainty, Doubt)
This argument takes the path there is impending doom that can only be prevented by a small investment right now. It could be fear of a major quality incident or a fear of a lawsuit or government fines. Especially, if you have just started in a new role, going to management with a doomsday argument can result in upper management questioning your credibility so this type of argument is highly risky but at times the necessary one to make.
Depending on your current situation, one of these three arguments can be successfully made. Once I have approved funding, I can move on to setting test automation goals.
I can use the funding to provide engineering relief. I have an ability to hire short term contractors to continue to execute manual tests while my team commits to developing test automation. I can outsource the creation of test automation, but I don’t like to do that since I prefer to develop my in-house capabilities. I can purchase test automation tools which enable my engineers to write test automation that is more robust and easier to create and maintain. I may invest in a remote lab that allows me access to a variety of test hardware and the ability to run my test automation quickly in parallel modes of execution. I have many options to consider and my goal is to free a large portion of my team to make us more efficient through test automation.
Successful test automation efforts need to progress in a crawl, walk, run state. Overzealous teams will try to do too much too soon (run) and end up falling and potentially causing more harm than good. You should start small, build success, and expand your efforts while learning and improving along the way.
Many teams have no test automation. Maybe they have some unit tests but mostly these test APIs and interfaces. Without user interface automated tests, it gets hard to revalidate the product especially in short ship cycles on a wide configuration matrix of hardware devices. Crawl means to focus on a small set of core legacy features and to write the automation robustly such that it can be run on a hardware configuration matrix to give you an ability to quickly recognize major issues. Maybe you start with a simple product login or sign-up sequence. Later, you perform a core functionality like searching for a product and making a shopping cart purchase. Legacy features are the easiest to automate in that their UI is stable and so there is less of a moving target phenomenon like automating new features can have. These core tests form building blocks and later enable you to develop the skills to create more complex scenarios.
Walk includes adding more complex feature automation and measuring coverage of test automation to start to use coverage as part of a quality assessment. During the walk stage, engineers start to collaborate on helper libraries and use abstraction logic to make their automation more resilient and make failure investigations simpler. During walk, new features are also targeted for automation while using techniques like page objects and self-healing approaches to make the automation more resilient to UI changes.
The run phase opens a world of possibilities for test automation. It may involve BOTs, complex test sequences and negative test conditions. It may seek to automate whole classes of issues like accessibility, security testing, or reliability tests that force failure code paths. It may seek to perform smart test selection to run sets of tests most applicable to a set of code changes.
I advise that the test automation tools you commit to use meet the needs of both the development team and test team. Collaborating on test automation makes the teams better together. It becomes easy to share test automation or helper libraries. It becomes easier to debug failures and create a singular quality view dashboard.
With an agreed upon toolset, I would make it a rule that new features are complete when they have test automation that validates them. This adds a bit of work to the developer writing the feature but will pay for itself with higher quality and a test suite that warns if the code is broken by other product code changes.
I would encourage the use of “Dark Code”. Dark code is the ability to ship new features behind feature gates so that you can runtime control which users experience the new code paths, and which do not. This allows you to slowly roll out new features to an opt in group while monitoring their experiences before deciding to make the new code path mainstream for all users. This can dramatically reduce Dev-Ops costs. Also, the customers that opt-in are more likely to be adventurous souls that deal with glitches more reasonably since they opted in for the new stuff. Dark code could implement an entirely new user interface, or it could enable a new feature. Dark code is good for both.
Telemetry is critical to enable failure monitoring and rapid diagnosis and response to customer issues. It may be that an operation is succeeding due to a retry code path and the users don’t perceive a failure. Telemetry can log that a retry code path has executed and backend monitoring can alert that this signals a potential issue to be investigated. The investigations are simplified through the distributed tracing and logging that telemetry brings. Coupled with dark code, you have a great go forward onboarding of new features approach. It may even be worth adding the telemetry to legacy features, but I would prioritize the new code. Azure Application Insights and Amazon CloudWatch are two such systems you can use. Zipkin is another.
I have one final suggestion which is to pay special attention to bugs that have made it to customers and ensure they are automated to form a bug regression suite. After a customer experiences a problem, they may forgive you, but should they experience that problem again in the future, they may just give up on your product. Also, bugs have a habit of occurring in bunches so paying special attention to ones that made it to customers will help you improve your QA system. If not for all customer reported bugs, then at least for a critical subset.
In summary, this seven-step approach will allow creating a great software quality improvement plan. One that makes your team more agile and efficient while improving job satisfaction.
- Get Funding through a Solid Value Proposition Argument
- Establish Test Automation Goals
- Jointly Collaborate with the Development Team on Shared Test Automation
- Require Feature Complete to Include Test Automation
- Check in New features as Dark Code
- Enable Telemetry and Monitoring
- Continuously Build an Automated Bug Regression Suite
I would recommend looking into Sofy.ai. It can help you automate your software testing (app and website) by utilizing machine learning to create tests, suggests test scenarios based on product changes and customer usage data. Check them out.