Can money really nurture and keep alive a data-starved AI?

The event Disrupt was held in Berlin last week. Watching the interviews with founders and co-founders of various tech startups on the Disrupt Berlin platform, and regularly reading about many tech startups (fintech, insuretech, meditech, femtech,…) that are popping up and fizzling out across the world every single day, it seems raising a few tens of millions is the new normal! All this, provided you are extensively using the word “data” at every twist and turn!

Yes, data indeed has the potential to revolutionise our lives through widespread applications: from creative risk assessments in banking & finance to financially expand credit coverage beyond those traditionally banked up to now; getting tailored insurance covers that serves an individual’s purpose and pocket, effectively doing away with million mile long clauses detailing every imaginable “ifs, ands or buts” still printed on paper and requiring at least several dozen signatures; devices which automatically tracks activity telling us how many steps we taken, flights climbed, distances travelled, calories burnt, and so much more; and then there are those that attempts various health predictions for men and women (broadly “humans”), using “voluntarily” submitted data and a bunch of algorithms to do the extrapolations based on machine learning and AI.

If you have been watching the “DC’s Legends of Tomorrow” (like we are forced to as parents), it seems Gideon (the AI with superhuman powers that reconfigures the body, mind and soul making immortality look within grasp) is already on the horizon! Let’s leave it Elon Musk to figure out the time travel part.

But taking a step back into our current reality, let’s put some thoughts into the “voluntary” data input part of the data collection process that is fuelling most tech based start-ups. Whether it is an app trying to predict ovulation/menopause or blood sugar levels/diabetes or flu outbreaks, the data input is highly dependent on the user and their interpretation of the severity & symptoms. This introduces discrepancies and bias. A biased or incomplete dataset, obtained from a non-homogenous population with limited information (highly feature centric rather than systems based), when used as a training material to smarten up our AI is no better than sending AI to the University of Phoenix (for an online crash course) and expecting them to perform like someone trained at ETH Zurich or Cambridge, or any decent university as a matter of fact!

One very striking example of diversity bias in data is with face recognition, where several thousands of images are used to train the algorithms. A skin tone bias in the training sample has proven to be real life disaster, with the infamous example of HP webcams not following motions of darker complexioned faces in 2009, or more recently with the SnapChat “dog filter” not applying at all on dark skinned people.

Unless we can regularise data collection, either by mandating automation (through connected devices), or foster more data sharing through less prohibitive regulations, it is difficult (and even dangerous in certain fields) to truly exploit data, at least in a way that revolutionises our lives by living up to the pitch being sold. Access to labelled data is the most sought-after ammunition in the war to power and grow Artificial Intelligence, at least until we can manage to bring unsupervised machine learning to the next level. But this also exposes us to the real danger of losing a large part of the human population, currently without access to the AI based technologies, if we do not work hard to prevent biases from entering the training sets (think diversity and inclusion, random sampling and automation).

However, looking at the positives, we have at least made great strides in establishing gender identities (at least for one for the time being). We are probably yet to hear about “maletech”, but yes “femtech” is here and flourishing! Even though most apps are currently clueless (some even retroactively adjusting predictions), luckily common sense has been around for a long time and is still one of the best rated app for survival!

And as for the billions of dollars going into all kinds of tech start-up financing, and very little into fundamental research and development, one can only hope that a little knowledge doesn’t become a dangerous thing.

Leave a Reply

Your email address will not be published. Required fields are marked *