One Graph on Jazz with Free Data Scraped from Wikipedia



This is a doughnut chart that counts the number of links to the pages of musicians from a sample of 56773 articles on jazz-related topics.

Insights & Future Directions:

  1. There is a disappointing absence of references to vocalists like Ella Fitzgerald, Billie Holliday, Nat King Cole, Sarah Vaughn, Frank Sinatra, and so on… I also would have been glad to see Sun Ra, Charlie Parker, and Erroll Garner rank highly enough to include in the chart without having incomprehensibly small pie slices.
  2. There could be some sample bias in the web-crawling algorithm or initial article I used to seed the scraping process. Associating each artist with an instrument or sub-genre and re-creating side-by-side versions chart filtered on those categories could surface that data, but it might be better to test whether the scraping process was biased by seeding with a vocalist’s article and measuring the increase in references to these artists. If the increase is small, we can probably rule out bias of the algorithm seed.



