by Doug Wood
When we decide to begin data collection, we usually ask one of two things: “What is everyone else doing?” and “What data do we need to collect and how hard will it be to collect?” Good questions, but it’s often what we do not ask that costs us money. By starting with something like these two questions, we jump into software choices (What is everyone else using?) and tools to take out the hassle of a complex task. This can lead to purchase of the wrong (or too expensive) software or to training and reference books (good, but we should keep the cart behind the horse to avoid back tracking).
Asking questions out of order can cost thousands of dollars. Implementing a sub-optimal solution can cost tens of thousands. If poor data lead to poor decisions, it can cost hundreds of thousands of dollars. There are ways to gain the data we need without making these missteps and they begin with understanding the goal. After this, work backward to build a system of data collection that is effective and efficient.
Data collection–From the end to the beginning
For example, let’s take a common situation: collecting quality data for a new product just starting production. There are central and foundational questions to ask. If you know the answers, you can move forward. Just don’t assume these answers are self evident. The road to bankruptcy is paved with “obvious” truths. And remember, if everyone has the same answers, then you know your organization’s on the same page.
Here are the questions to ask:
- What are your products?
- Who buys them?
- Why do they buy them?
- What kind of defects matter to your customers?
- What aspects of your products are special to your customers and why?
- What do these answers mean inside your organization?
Not doing this runs a grave risk of majoring in the minors. That is, gaining precise and accurate data on things that don’t really matter. One of the major wastes in larger organizations is this kind of measurement. It’s often enshrined in a kind of bubble, as in “we’ve always done it that way.”
You always have a product. Even if what you provide is a service, your customers receive something that they pay you for. Information is a product, clean clothes are a product, on-time delivery of a package is a product. Defining your product(s) is the first step.
Some organizations such as governments or hospitals think that they don’t have customers. Many others see themselves as first having customers, then end-users to whom those customers deliver. The term “customer” is too vague, like “quality.” It has been suggested by consultant Robin Lawton to abandon this term in favor of three terms: “end-user,” “broker,” and “fixer.” These three terms allow us to see the differing needs of customers and know what we should measure to meet those needs.
- End-users: In his book Creating a Customer-Centered Culture (American Society for Quality, 1993), Lawton defines end-user as an “individual or group who actually use the product to achieve a desired outcome.” In the case of food products, this would be the people who eat. The desired outcome is usually one of not being hungry, although there are many other desired outcomes for eating in a first-world society. End-users are more numerous than other users and they are the most important type of customer.
- Brokers: According Lawton, “Brokers transfer the product to someone else who will use it.” Brokers can work for either the producer or the end-user, as in real estate transactions. Following the example of food products, the truck delivery services, food warehouses, grocery stores, and homemakers who choose, buy, and bring food to the home and load the pantry are all primarily brokers. The product must meet their requirements or it will not make it to the end-user.
- Fixers: Finally, according to Lawton, “Fixers transform, repair, correct, modify, or adjust the product for the benefit of the end-users.” Some of the brokers mentioned above in the food product cycle may at times be fixers, even if their major role is broker. Clearly, it’s the cook who acts as the primary fixer in the food product flow.
This diversion from data collection shows that the upfront definition is critical. The data you collect will change based on these definitions. A data collection process that’s precise and accurate won’t help you if you’re collecting the wrong data. In the worst case, the wrong data will mislead you and drive decisions that cause waste.
When you know your products and your customers, now you can list why they buy. This is the purpose of your product. There is a lot more on this subject and in this short article, we must skip all the steps you need to do. In the end you end up with a list of measurable attributes of your product(s). These measurable attributes will lead you the correct data for your process. Now we can talk about tools. To get started, let’s focus on a two tools: sampling and check sheets.
Sampling is a technique where we select items from a large population so that they are both random and representative. This allows us to gain a sufficient picture of the population to meet our needs and costs much less than looking at every item in the population. It doesn’t really matter if we are monitoring items in a flow, checking documents, gathering feedback from people downstream from our work, or testing something new. The key is to be both random and representative. But how can this be? Those words seem to be mutually exclusive. We define them this way: “Representative” means that the samples are proportional to the actual population for the purpose that we are collecting data. (This is why knowing the end use of the data is so important.) “Random” means that each item has an equal chance of being selected.
As this point, most people ask “How many samples do I need to look at to have a statistically significant answer?” This is the wrong question, and needs to be answered with “it depends.” The number of samples you need to look at depends on four things:
- How big is the population? (Define the population of events/parts/areas you will draw from.)
- How variable is the value I will be measuring? (More variation in what you measure means fewer samples.)
- How sure do I want to be that my sample reflects the population?
- How far off can I be and still have a useful answer?
There are formulas to help with these questions. The key to getting good data is to avoid what statisticians call “bias.” We can simply call this “error,” although it usually refers to some kind of systemic error that we may not be aware of in our choice of samples. Because of how easy it is to be unaware of bias in something you are very close to, you should use someone else to help you think it through.
Don’t try and pick random numbers out of your head to pick your samples. This never works because all humans are pattern seeking. We see and create patterns where none exist. This is so deeply wired in our brains that it’s often subconscious. Use a random number generator and take your samples according to the random numbers to avoid this bias.
One final note about sampling: Sample results may be more accurate than an attempted survey of the entire group. This may seem incredible but the act of measuring everyone or everything in a group often disturbs the very data you are trying to collect. A sample can be so much less intrusive and prevents the characteristic you are measuring from being disturbed.
Check sheets are simple forms that are used to mark whether something has happened or is observed. They may collect numeric values but they are most often paper forms with rows and columns that people make some kind of mark to record either something happened or no mark if the event or situation didn’t happen.
The events or situations are usually predetermined and listed in the first column or first row. The opposite axis (row or column) is often a time increment (such as minute, hour, day, week, month, etc.) The form has empty boxes in the center and the data recorder places marks in the boxes. The events or situations may be machines, people, types of problem, situations, condition of something observed, etc. We have all seen a check sheet on the back of a gas station’s bathroom door recording how often the room is checked or cleaned.
Key to making this data collection work is making sure everyone doing the recording knows what the events or situations are. Does a check mark on the bathroom form mean someone has peeked inside for a second or that they have performed all the cleaning a public bathroom requires? The term for this is “operational definition.”
Your form needs to be carefully thought out and tested. It should be set up so that only simple marks are needed during data collection and data are organized on the form for later use or computer data entry.
Data collection–In summary
The questions to ask may be slightly different for different organizations, but they are likely to be similar to this:
- What decisions do you plan to make with the data you collect? What will the data be used for?
- With a goal in mind, what answers do you seek? How long do you have to get data?
- What data do you have already and what will you need to supplement it?
- Who will be collecting the data?
- What tools will aid in data collection?
Remember to keep it simple. Having good data doesn’t have to be complex, only clearly planned and well founded. Use good practices to collect data samples. Work with someone else to be a sounding board. You will efficiently gain the answers you need with precision and accuracy.
About the author
Doug Wood is the principal consultant of DC Wood Consulting LLC. The firm focuses on process improvement, employee engagement, data analysis, and related areas. He has more than 30 years of experience in industrial engineering and quality and has saved millions of dollars for his clients’ firms and employers using his practical approach: motivation, improvement, measurement, and control.
Doug is the author of The Executive Guide to Understanding and Implementing Quality Cost Programs, published by ASQ Quality Press.