Improving Data Quality for Superior Results
Improving Data Quality for Superior Results


Have you ever been asked to create a model or derive insights out of data that is inaccurate, missing, or unreliable? As data experts, we know that no matter how good our model is, the results will be unreliable if the source data is unreliable. As the saying goes– Junk in, junk out.

This talk will explore the approaches that Wayfair’s Business Intelligence Team has used to improve the quality of source data, and in turn increase the accuracy of reporting, models, and insights.

Yard Arrival Date - Wayfair has increased data capture from <10% to > 99% for Yard Arrival Date. Yard Arrival date is the date that an SPO (Stock Purchase Order) arrives at our warehouse yard. Knowing the exact date that an order arrived in the yard is essential to creating predictive models that will allow us to predict when future orders will arrive. We found that the capturing Yard Arrival required staff to perform extra work that they did not see the value in. There also was no accountability – no one was checking if the staff was entering the data, and it didn’t impact their work, so there was no incentive to do this seemingly needless work. By explaining the usefulness of this data, working with the warehouse staff to improve the data capture process, and creating a public dashboard that drove accountability, we drove Yard Arrival Date entry compliance from less than 10% to over 99%.

ETA to Port - In our International Supply Chain many data points, including ETA to port, are manually entered by Wayfair staff. Like any manual data entry process, this leads to the possibility of overlooked data, data that has been accidently mistyped (“fat fingered”), and other sources of manual error. When it came time for the end of month report, the team would inevitably be hit with a barrage of weird data that they needed to track down and corrected. To avoid a mad rush at the end of the month, the Business Intelligence team created a reporting portal that would allow the team to identify shipments with dates that were questionable – for instance, a shipment that left Asia in late January and arrived in the US in early January (did they have a time machine?). This has improved data quality yielding reliable insights throughout the month. Additionally, we moved to an EDI data source for some data inputs, which reduces some of the risks associated with a manual data entry processes (as our business scales, manual data entry is no longer possible).

Estimated Yard Arrival Date - Now that we had a reliable yard arrival date, we could start improving the model that would estimate yard arrival date. To do this, we looked at the original model and realized that the field had two meanings – some people used it as arrival to the yard, while others interpreted it as the date the shipment was unloaded into the warehouse. Often these are the same things, but in times of high inbound volume, some containers can sit in the yard for weeks. We aligned on a consistent definition of yard arrival, and began to break down the model by comparing Actual and Estimated Yard Arrival. The data revealed that appointment date, which we had assumed was correct 100% of the time, was incorrect 40% of the time. Once this was revealed, we worked with the operations teams for process changes; if an appointment was made more than a month ago, the team should call the carrier and confirm that the product was still scheduled to arrive on time. We also created a model that uses machine learning to take current appointment times and predict when the actual arrival will occur.


Kaitlin Andryauskas is a Business Intelligence Manager at Wayfair supporting Wayfair’s global supply chain. She approaches her work by identifying the problem that will unlock the largest potential, and then find the data that will provide the required insights. She is passionate about solving complex problems requiring both analytical skill and business acumen, as well as using data to solve pressing social issues. Kaitlin has an undergraduate degree in Sociology from The University of Texas at Austin, a Master’s in Business Analytics from Bentley University and 24 credits towards a Master’s in Education at Johns Hopkins University. Kaitlin is former high school history teacher and Teach for America Baltimore Alum.

Open Data Science




Open Data Science
One Broadway
Cambridge, MA 02142

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google