Between 2012 and 2016, the Netherlands’ Pinkpop music festival booked 248 acts. Of those acts, which entertained hundreds of thousands of people, just 11.5% were female.
So is Pinkpop a sausage fest? Or are gender ratios askew across the music industry?
My colleague Rufus Kain had some interesting questions after counting up the Pinkpop artists.
I decided to help answer them by collecting more data and looking for patterns. The scraping got out of hand fast. But quite a few late nights later, I’m happy with the results.
Here’s how I did the research – and how you can use my data.
I started with 3FM
I started on Dutch rock station 3FM’s website, where I collected all the songs played from April 1, 2016 to April 1, 2017. For each song, I looked at the artist’s name and the date and time it was played.
To do this, I used Outwit Hub, a handy program that lets you build a simple scraper in minutes.
The expedition netted data on more than half a million songs by more than 8,000 artists
It was so easy, I went on to collect similar data from Sky Radio, FunX, Radio2, Qmusic, and Radio 538. My expedition netted data on more than half a million songs by more than 8,000 artists.
Then came the hard part: how do you decide whether an artist is male or female? And how do you do it for that many songs and acts without going crazy?
Then I hit an open-source goldmine
Our editorial designer Leon de Korte pointed me to MusicBrainz, an open-source database full of all kinds of information on pop artists. MusicBrainz has an API, and a Python library has been made.
Why is that so useful?
An API is software that lets you consult a database directly. So you don’t have to go to the website to automatically search for information and then automatically retrieve it.
A Python library is a collection of scripts in the programming language Python. These allow you to easily do things like search the MusicBrainz database.
Thanks to the API and the Python library, it was pretty easy to categorize about half the artists as male or female. For mixed bands and artist collaborations, I calculated the respective percentages of women and men. So Eminem ft. Dido (which still gets plenty of airplay) would be 50% male and 50% female.
I had to assign genders to the remaining artists by hand. The OpenRefine application enabled me to do that pretty quickly. It let me sort according to how often each artist got played, so I could look at heavy-rotation songs first.
I didn’t bother with the bottom 1,000 or so artists, many of whom were played only a total of once or twice on the six stations. With a score derived from more than 99% of the songs, I was satisfied.
On to YouTube and Spotify
Since I’d gotten the hang of things, I ran the artists through two other databases: YouTube and Spotify.
At 3FM, the DJs – and, I think, their managers – decide what gets played. But on YouTube, users search for the songs they want to hear. And Spotify seems to more strongly reflect what people want to listen to (though it, too, makes selections in the form of playlists).
YouTube has a fine API, and with a bit of Python code, it was a piece of cake finding out how many times each song’s video had been viewed and how many likes, dislikes, and comments it had received.
Spotify also has an API, and I was amazed at how liberal it was – you can get a ton of data. Here, too, Python code made it fairly easy to see how popular an artist or song is.
Finally, I indexed the radio songs by how many times they’d been played and the YouTube videos by how many views, likes, and dislikes they’d received. This allowed me to better compare the radio stations with YouTube and Spotify.
What are the flaws?
Was my research perfect? No, of course not. Here are a few significant holes:
- With radio, it matters what time a song is played. Is it often heard during prime time or only late at night?
- On YouTube, I couldn’t tell when a video had been popular. Some clips have been online for six or seven years. That makes it hard to compare one video’s popularity with another’s. And the same goes for songs on Spotify.
- With 8,000 artists, we have a lot of data for a big group of musicians – but who knows how many women artists aren’t included in that group? So we can’t say anything about the overall gender imbalance in the music industry.
Finally, comparing radio, Spotify, and YouTube plays didn’t offer up many useful insights. I couldn’t find any truly illuminating patterns.
If you care to take a shot, I’d like to hear about it. Either way, I’d love it if you’d crunch the Dutch data. Email me at dimitri [at] decorrespondent [dot] nl and I’ll get it to you (40MB, CSV format).
And feel free to use our methods to go after data where you live. Let me know what you find out!
—Translated from Dutch by Laura Martz and Erica Moore
More from De Correspondent:
Here’s just how few women got played on the radio last year I tallied up a year’s worth of radio programming to see how many female artists get airplay. Turns out, surprisingly few. Why is that? And what can we do about it? Does the shortage of female headliners start in grade school? From radio to festivals, women in music are in the minority. With the help of readers and industry experts, I’ve found three main reasons why (along with lots of new questions). Here’s an update from my women-in-music series: Why is it that so few female artists grace our airwaves (or the stage)? I tallied up a year’s worth of programming at Dutch pop and rock station Radio 3FM. Turns out women account for just one out of every five artists the station plays. At music festivals, the gap is even greater. How widespread is this disparity and what’s behind it? Help me find out!