Hi,
"Programming is the new literacy," says Gaël Varoquaux in an Amsterdam cafe. I am wearing rain-soaked jeans, he is in red outdoor pants.
I am lucky to meet Varoquaux in person. At the moment the French computer scientist is on sabbatical in Montreal, but he happens to be in Amsterdam for a congress.
Varoquaux is one of the founders of scikit-learn, a machine learning package in Python, today’s most popular programming language. It currently has almost one million unique visitors per month. When I ask him what you can do with the package, he says: "All machine learning – except deep learning."
That is unusual, because deep learning is the hype of the moment. Google Translate, Siri, Spotify – they all use it. Deep learning revolves around neural networks, often complicated calculation systems that require a lot of data and calculation time.
That is precisely the problem. The huge data sets that you need often belong to large companies. Plus, training a deep learning model is very expensive.
That is why scikit-learn offers machine learning that is more accessible. "Data science for the many, not the mighty," says Varoquaux. Not only large companies should be able to innovate, he believes, but everyone. Otherwise, important questions will be left unanswered.
The package is therefore open source, but – more importantly – Varoquaux and colleagues do their best to make it as easy as possible. For example for doctors, says Varoquaux, who himself applies computer science to medicine. "They are very bright, they can certainly learn."
I am curious: Have you ever worked with scikit-learn? What do you think about it? And, more generally, what do you think of the idea that programming is the new literacy? Please tell me more in the contributions.
COMPAS
It has long been known that computer models can be biased. One of the articles that have put this problem on the map is "Machine Bias" from ProPublica. Journalists showed that COMPAS, a software package for predicting recidivism, judged black defendants differently from white ones.
The program expected African Americans to re-offend more often than white Americans. But this is not a reflection of reality. When ProPublica took a look at the data it turned out that of those who did not reoffend, twice as many black Americans were referred to as "high risk". In other words, the algorithm labeled innocent people unfairly more often if they were black.
But what is the solution? Karen Hao and Jonathan Stray address this question in a new article in MIT Technology Review. Through a game, they let you choose who should be considered "high" or "low" risk.
You soon see that this is a diabolical dilemma. Either you use the same threshold for both white and black Americans, which puts the latter at a disadvantage. Or you give the two groups different thresholds – but then you discriminate anyway.
Judges, of course, are also far from perfect; they also have prejudices. But, write Hao and Stray, you can at least hold them responsible for their decision. COMPAS, on the other hand, is owned by the Northpointe company, which regards the algorithm as a trade secret.
The article is a good exploration of the thorny issues surrounding justice. Hao and Stray conclude with a quote from Andrew Selbst, a professor of law:
“Whenever you turn philosophical notions of fairness into mathematical expressions, they lose their nuance, their flexibility, their malleability. That’s not to say that some of the efficiencies of doing so won’t eventually be worthwhile. I just have my doubts.”
Just before you go...
... my book is now also auf Deutsch: Der größte Bestseller aller Zeiten (mit diesem Titel).
Subscribe to the newsletter Follow Sanne’s weekly newsletter to receive notes, thoughts, or questions on the topic of Numeracy and AI.