Four Suggestions for Using a Kaggle Competition to Test AI in Business

According to a McKinsey report, only 20% of companies consider themselves adopters of AI technology while 41% remain uncertain about the benefits that AI provides. Considering the cost of implementing AI and the organizational challenges that come with it, it’s no surprise that smart companies seek ways to test the solutions before implementing them and get a sneak peek into the AI world without making a leap of faith.

That’s why more and more organizations are turning to data science competition platforms like Kaggle, CrowdAI and DrivenData. Making a data science-related challenge public and inviting the community to tackle it comes with many benefits:

Low initial cost – the company needs only to provide data scientists with data, pay the entrance fee and fund the award. There are no further costs.
Validating results – participants provide the company with verifiable, working solutions.
Establishing contacts – A lot of companies and professionals take part in Kaggle competitions. The ones who tackled the challenge may be potential vendors for your company.
Brainstorming the solution – data science is a creative field, and there’s often more than one way to solve a problem. Sponsoring a competition means you’re sponsoring a brainstorming session with thousands of professional and passionate data scientists, including the best of the best.
No further investment or involvement – the company gets immediate feedback. If an AI solution is deemed efficacious, the company can move forward with it and otherwise end involvement in funding the award and avoid further costs.

While numerous organizations – big e-commerce websites and state administrations among them – sponsor competitions and leverage the power of data science community, running a competition is not at all simple. An excellent example is the competition the US National Oceanic and Atmospheric Administration sponsored when it needed a solution that would recognize and differentiate individual right whales from the herd. Ultimately, what proved the most efficacious was the principle of facial recognition, but applied to the topsides of the whales, which were obscured by weather, water and the distance between the photographer above and the whales far below. To check if this was even possible, and how accurate a solution may be, the organization ran a Kaggle competition, which deepsense.ai won.

Having won several such competitions, we have encountered both brilliant and not-so-brilliant ones. That’s why we decided to prepare a guide for every organization interested in testing potential AI solutions in Kaggle, CrowdAI or DrivenData competitions.

Recommendation 1. Deliver participants high-quality data

The quality of your data is crucial to attaining a meaningful outcome. Minus the data, even the best machine learning model is useless. This also applies to data science competitions: without quality training data, the participants will not be able to build a working model. This is a great challenge when it comes to medical data, where obtaining enough information is problematic for both legal and practical reasons.

Scenario: A farming company wants to build a model to identify soil type from photos and probing results. Although there are six classes of farming soil, the company is able to deliver sample data for only four. Considering that, running the competition would make no sense – the machine learning model wouldn’t be able to recognize all the soil types.

Advice: Ensure your data is complete, clear and representative before launching the competition.

Recommendation 2. Build clear and descriptive rules

Competitions are put together to achieve goals, so the model has to produce a useful outcome. And “useful” is the point here. Because those participating in the competition are not professionals in the field they’re producing a solution for, the rules need to be based strictly on the case and the model’s further use. Including even basic guidelines will help them to address the challenge properly. Lacking these foundations, the outcome may be right but totally useless.

Scenario: Mapping the distribution of children below the age of 7 in the city will be used to optimize social, educational and healthcare policies. To make the mapping work, it is crucial to include additional guidelines in the rules. The areas mapped need to be bordered by streets, rivers, rail lines, districts and other topographical obstacles in the city. Lacking these, many of the models may map the distribution by cutting the city into 10-meter widths and kilometer-long stripes, where segmentation is done but the outcome is totally useless due to the lack of proper guidelines in the competition rules.

Advice: Think about usage and include the respective guidelines within the rules of the competition to make it highly goal-oriented and common sense driven.

Read the source article at deepsense.ai.