Why yield monitor data cleaning workshops are offered
Updated: Aug 6, 2020
I usually offer a couple yield monitor data management workshops each year. These workshops focus on data quality from collection of that data in the field to post-processing data cleaning. These workshops have been offered in an Extension context with adult audiences comprised of farmers, crop consultants, and data service providers. Workshops have also been provided in undergraduate classroom curriculum. The initial reasoning for offering workshops on yield monitor data management and cleaning was just that: for workshop participants to learn how to clean yield monitor data from their fields (or their customer’s fields) so that they could clean the data themselves. However, I soon learned that the majority of the participants would never actually perform these processes themselves. I also learned that many participants felt the costs of cleaning the data were greater than the expected benefits of their time and effort. In addition, I learned that many farmers and crop consultants wanted to learn what the process entailed to increase their understanding (and confidence) in services that others were providing. Looking back, I learned more than the workshop participants. Fast forward 15 years after offering the first workshops on cleaning yield monitor data. The initial objectives of offering the workshops have been negated but we’re still offering these workshops. Today there are three reasons we offer these workshops to the same audience, although these objectives have evolved into something entirely different. 1) Learn how combine harvester operation affects yield data quality, and how to improve overall data quality 2) Understand that manually cleaning all fields takes a substantial amount of time, and although automated routines are not always perfect, are likely the only feasible solution to data cleaning for all fields 3) Learn how to discern when yield monitor data has not been properly post-processed by data service providers claiming to have cleaned and analyzed data. Learn how combine harvester operation affects yield data quality, and how to improve overall data quality Combine harvesters, aka headers as their known in Australia, were originally intended to reap and thresh grain. Today, one could argue that ‘collect data’ could be the third intent. For decades, the combine operator was trained to operate the machine to harvest grain and unload to another machine. When the operator is concerned with collecting quality yield monitor data during harvest, additional machine operation considerations become necessary. In general, it is advised that the harvester be operated under similar conditions with which the yield monitor was calibrated. This implies maintaining a steady flow of grain into the harvester. It could be argued that ground speed would need to be changed to maintain consistent volume of grain. However, changes in ground speed must be gradual and not abrupt. An abrupt change in ground speed results in a spike in yield monitor data – either a large increase or decrease. This is a common result in many fields I’ve analyzed. These outliers can be detected by examining percent change in ground speed between yield data observations within a transect, but the data need to be flagged for omission from the analysis. In other words, the data can be cleaned but that leads to fewer available observations. Moral to the story: the behavior of the combine operator impacts the sample size and quality of the yield monitor data. Understand that manually cleaning all fields takes a substantial amount of time, and although automated routines are not always perfect, are likely the only feasible solution to data cleaning for all fields Properly cleaning yield monitor data is somewhat tedious and time consuming. It’s been estimated that it takes an experienced human 30 to 45 minutes to process each field. When there are only one or two fields then this is not a large concern especially if there are a couple of on-farm experiments that the farmer is interested in analyzing. However, when the desire is to clean all fields then the time requirements may be prohibitive. Although many people view automated processes to be less reliable than experienced humans at their best, workshop participants are provided with information suggesting that the costs of lower perceived accuracy with automation is outweighed by the time and human capital costs of manually cleaning data. In addition to desktop software commonly used in these workshops, automated processes operating behind the scenes by some farm data companies are being introduced. Moral to the story: farmers and their advisers are becoming more comfortable with automated yield monitor data cleaning and processing. Learn how to discern when yield monitor data has not been properly post-processed by data service providers claiming to have cleaned and analyzed data. There are many companies proving precision agricultural services. Social media has a steady stream of yield and soil maps displaying the prowess of the services offered. It is unlikely that all precision agriculture data companies are providing the same quality services. A few tell-tale signs of sufficient data quality are discussed in these workshops as part of the cleaning process. Workshop participants will be able to discern whether data from their farm (or even other farms) has been properly post-processed. These tell-tale signs, or ‘litmus tests’ as we refer to them, include headland issues and flow delay settings. When flow delay has not been correctly set, it can be said that the yield data appears ‘out of focus’ with jagged edges along regions of changing yield. Although flow delay is a relatively simple adjustment to correctly make, many services do not adequately set flow delay. Moral to the story: workshop participants learn how to detect whether yield data have been properly managed. Final thoughts Over the last 15 years, I’ve analyzed 100’s maybe 1000’s of fields. In doing so, I learned how I as a human evaluated yield monitor data for quality. I used insights into my own decision-making process to develop an automated analysis that essentially replaced myself at each decision node. It took a couple hundred fields to settle on a process that I believed worked for all the fields that I had analyzed. Researchers, myself and others, have made substantial progress in automated yield data cleaning tools. It should be noted that these processes do not remove high and low yield data points just because the yields are high or low, but instead evaluate machinery dynamics to determine if the yield monitor was able to make an accurate measurement under current conditions. As trust in automated processes increases, these tools will become more commonly adopted. The quality of yield data is important to make appropriate management decisions, and automation likely the best path forward.