Pollution, bad working conditions, and animal welfare are topics that are often pushed away by profit. This certainly holds for the modern ‘fast-fashion’ industry which produces most of our clothes. Fortunately, some brands and initiatives are making an effort, but how do you know as a consumer?

It is difficult to make a well-informed choice when buying sustainable clothes …

Searching for information is very time-consuming. Sustainability information is scattered over several sources. Websites, blogs, and books provide valuable guidelines about clothing brands that try to make a difference. Clothing brands provide more and more information on their website about sustainability. Unfortunately, it takes a lot of manual work to gather and build up this sustainability information, therefore the available information is often outdated.

… but Artificial Intelligence can do the job! 

We believe that the (fashion) industry can move towards sustainability by automating determining sustainable brands. Therefore, we use scraping, artificial intelligence, natural language processing, and explainability, to provide more sustainable clothing information, faster than current approaches.

In data science, you often start with a dataset that needs a lot of cleaning. In this case, we had to build from scratch by first making a database that contained all the clothing brands. Second, for all clothing brands, their homepage was added. From here, the scraping fun could start. However we did not have infinite recourses, so we had to be smart about scraping exactly the right amount of data.

While investigating the data for the first time, after some preprocessing, we got the feeling that the model could give valuable insights. These two word-clouds contain the most frequent words per class and already show a clear difference. Interestingly, we mainly see brand names in the ‘not-sustainable’ word-cloud. Fortunately, the model does not train on the names of the specific brands, which we know through explainability.

Explainability is crucial for people to believe the outcomes. In our case, it also improved the quality of our preprocessing. As a result, we now have several models with over 80 percent accuracy. For the outcomes that are still too uncertain, we make a class ‘unknown’. In this way, people can trust the outcome of the model.

Information at your fingertips

We were quite proud and happy at this point, however, a model itself does not accomplish anything. In our opinion, this model could make a change by providing its output publicly. Therefore we built goodbase.ai. This website provides sustainability information on every clothing brand (we knew). You can add brands yourself that will automatically be assessed.

When you think about the online buying process of clothing, it is still a hassle though to constantly check another website while shopping. You want the information at your fingertips without switching websites. Therefore we built a plugin together with project Cece, that we will launch in the second week of September.

Next steps

Although we are proud of the great things we have accomplished, we do know a lot more impact can be made. Both at the technical side as well as in terms of usage.

This is how we see the future, how do you see it?

Our database will provide a piece of the sustainability puzzle. Do you want to know more about how we do this and how you can contribute?

Be part of the change!

If you want to know more about why we built the model: https://amsterdamdatacollective.com/2019/12/04/blog-could-artificial-intelligence-help-you-buy-sustainable-clothing/ or please get in touch.

About the Author/ODSC Europe 2020 speaker: Joanneke Meijer is a Manager at Amsterdam Data Collective and initiator of the sustainable benchmark. She is an experienced data science consultant, focusing on forecasting, pricing, operational research, and text mining. By coaching teams towards delivering actionable insights from data, she continuously manages to create a sustainable impact.