Sometimes, it feels like life is pushing me to do something. I first heard about W.E.B. Du Bois in April 2021, during S-H-O-W 2021 (a data visualisation event). I asked a speaker for book recommendations, and he recommended The Souls of Black Folk.
After that, it was quiet for a while. But recently a few things happened in the span of a couple of weeks:
- After my first year of full-time freelancing (2023), I decide to reward myself with some inspiration and I got myself a copy of W.E.B. Du Bois’s Data Portraits, Visualising Black America.
- Around the same time, I started reading the scifi book Seveneves, in which one of the main characters is called Dubois.
- When I finished that read, I started Underground Railroads. This book gave me some new insights in the history of slavery.
- And a few days later I took time to sit down on a quiet afternoon to finally read W.E.B. Du Bois’s Data Portraits.
Not long after that, I see an announcement from Data Visualisation Society (DVS) about an open competition:
The goal of the challenge is to celebrate the data visualization legacy of W.E.B Du Bois by recreating the visualizations from the 1900 Paris Exposition using modern tools.
And I feel like I just have to give this a go.
Making Data Portraits in Matplotlib
The competition feels like a good new step in my personal Du Bois journey, and a good reason to exercise my Matplotlib skills. Recreating some of the data visualisation plates he and his team made by hand(!) in 1900, is an interesting way to get up close and personal with his work.
The #DuBoisChallenge2024
consists of 10 weeks and each week features one of the data portraits. They provide all the necessary data for each. My goal, as stated in the title, is to create this using only Python and Matplotlib (and some Pandas).
Besides that, I try to limit myself to 2-3 hours of work per week on one plate. There are some I think I can manage to make in that time, but some others (like the first one), feel a bit more tricky.
Reflections: from prints to python to prints
I now have 10 weeks of recreating Du Bois data visualisations behind me. And it truly felt like a special experience. Having read about Du Bois’s work early on in 2024, the #DuBoisChallenge2024 felt like the perfect way to get up close and personal with his (and his team’s) work.
Somewhere in week 3, I got the idea to print the 10 visualisations when they would be done. It felt like a fitting thing to do, bringing my recreations in python back to the original format: print.
And the prints just got in 🙂
Looking at the prints, it feels like some colours of the provided style guide are a better fit for the medium. I especially have this feeling with the green. On screen, it pops out quite a bit, but in print the colour blends in nicely with the overall colour palette.
I learned a lot by taking part in the challenge. Some highlights for me are:
- Hands-on experiencing implementing Du Bois’s design decisions (e.g. showing data differently than they technically are to emphasize a point or effect).
- Feeling way more comfortable with using custom texts and data labels.
- Creative applications of data visualisation techniques, e.g. line plots as custom grid lines, random data as a ‘ripped paper effect’, and scatter plots to improve readability of data labels.
In the end, I feel more comfortable with python & matplotlib than I did before. And that feels like time well spent 🙂
If you continue reading, you’ll find a link to the code for all the visualisations (as jupyter notebooks) and the live blog I kept during the past few weeks.
For now, I’d like to thank the Data Visualisation Society and Anthony Starks for setting up the challenge, providing all the examples, background information, and preparing the data.
Jupyter Notebooks
Curious about the code? You can view the Jupyter Notebooks here. Updated to include challenge 01 to 10 (all of them!).
For general thoughts and notes, continue reading below (I put my most recent plot first).
10: A Series Of Statistical Charts Illustrating The Conditions Of Descendants Of Formal African Slaves Now Resident In The Unites States (plate 37)
You can view the original here and my version below.
It’s not a perfect match. But it is one that I am proud of.
Before the challenge began, there were a few I was not looking forward to (mostly the ones with maps) and this was one of them. They felt difficult to make. But after having made 9 challenges (struggling with some and having fun with others) it was fairly easy, though I struggled a bit with the pie chart labels.
This is the first one where Du Bois made a design decision I don’t really get. He adds information in two languages in a smart way: by using different colours for English and French. But why does the last French paragraph appear to be in black? I decided to use red instead to keep the language-based colour-split going. (I am missing a message here? Or was it something that just happened during production?)
In the end, this plate shows the bold colour palette that Du Bois uses throughout his data portraits. And I feel like my recreation reflects that.
9: Proportion of Freemen and Slaves (plate 51)
You can view the original here and my version below.
Alright, that was a nice one to make again.
I’m getting more playful with the use of various matplotlib options. There are two examples in this plot:
- The black grid lines: instead of grid lines, I drew line charts.
- The green area behind the data labels (%): the default option for background colour didn’t suit the design well, so I opted to go with a scatter plot using squares for markers.
And just because I like making these, here’s a GIF moving from the default plot to the Du Bois version:
8: The Rise of Negroes from Slavery to Freedom in One Generation (plate 50)
You can view the original here and my version below.
It’s still feels special to implement DuBois’s style. Just compare what you see above with the visual you see below.
What a difference.
It’s interesting to note that the two bars both add up to 100%, but the second one is a bit larger in Du Bois’ design. They are roughly 6cm and 10cm when I measure them in my book. It’s a way to emphasize the second, more positive dataset.
I also tried to check how this ratio (6/10=.6) compares to the total number of people related to both data sets. I found a data set on Wikipedia and got to 4,441,830 for 1860 and 7,488,676 for 1890. This gets us a rate of .593. Which is ~.6!
Nice one.
7: Illiteracy of the American Negro compared with other nations (plate 47)
You can view the original here and my version below.
This week was fairly straightforward. I therefore took some time to implement a scalable font. When you change the size of a visualisation with fixed font sizes, the data visualisation will look okay. But the font will be all off:
As you can clearly see, the titles don’t fit the plot area. Most labels look okay, but one label overflows onto the visualisation.
The fix is easy. Make the font size depend on the size of your plot. My original plot had a width of 7.4
inches and the new one is 4
inches wide. Using these two values, I calculate a relative width (4 / 7.4
) and multiply the font size by this number.
Et voila:
You can view the related notebook for a full code example.
6: Amalgamation of the White and Black elements of the population in the United States (plate 54)
You can view the original here and my version below:
My main learning from this week: things are not as simple as they seem. This was more challenging than I thought it would be… with some data exploration work, several types of custom labelling, and a two-tone grid line…
About the data exploration: based on the data labels in the plot, there is only a limited data set available. I used a drawing program to estimate the missing data:
And then used good-old Excel to estimate some values, comparing line lengths with data values in his visualisation. Using those values, I noticed that Du Bois again made an interesting design decision here. See that little dent in the data (see the black circle):
That is something his visualisation does not show. By not showing that dent and by using subtle gradients, he may very well send the message that people should not consider the differences to be strict.
The thing I’m missing from original is the zigzag at the top, similar to the one we saw in challenge two. Because I try to limit my time on this each week and it was quite a bit more challenging than I thought it would be, I haven’t implemented that detail.
5: Race Amalgamation in Georgia (plate 13)
That felt a relief. If you’ve read my experience working on challenge 04, moving from that complex setup (for me) to a very simplistic one, was so much fun.
You can view the original here and my version below.
Besides that this challenge felt like a relief, I’m starting to feel more at ease with using matplotlib. For example: Du Bois’s use of title, subtitles, and custom axis text has forced me to explore the different ways of adding text to a plot. It made me more comfortable adding them.
That is a learning I’m happy with 🙂
4: The Georgia Negro (plate 01)
You can view the original here. And if you do, you can clearly see that I haven’t managed to recreate this one properly.
Here’s my version.
It turns out that the spherical map projection and projecting custom shapes onto them is something I’m not familiar with yet. I’ve tried a few approaches, but after about 2-3 hours worth of attempts, I decided I wasn’t going to make anything work in my planned 2-3 hour limit per challenge.
So I opted to give me a few more hours, tried to make something different but related, and got to the image you just saw.
There are a few things I like. I tried to put focus the destination ports that slaves were transported to. There’s a dot for each destination port and an open circle sized by the number of slaves that were transported there.
The low opacity lines that represent slave routes (from source port to destination port) makes them harder to pinpoint, which is something most people were not interested in doing after buying slaves back then (I think). I also decided to not highlight the source ports to contribute to this effect.
The learning of this week: there are still some things that Du Bois did in 1900 that I can’t easily recreate using only code. (And it’s a lot harder to make something nice if I have to design an alternative myself 😅).
A day later…
I found another hour to work on it and gave the geospatial projection another try. Thanks to swatchai I got the data projection working on the Matplotlib Basemap projection:
After this version, I decided to slightly rotate the right globe to make the two semi circles in the centre align. It does ‘hide’ Africa a bit too much, but it helps emphasise the point of origin of the lines in the left globe.
It’s still far from a perfect match of the original, but at least it has the spherical maps with data on them.
Yes! 🙂
3: Acres of Land Owned by Negroes in Georgia (plate 19)
You can view the original here.
It feels like this is very matplotlib-able. And it turns out it is. Here’s my version of Du Bois’s plate 19.
You may notice some changes. I decided to go for a serif font for the title and annotations (Roboto Slab). Besides that, I changed the aspect ratio to an A-paper size one. I decided I want to print my versions when they are all done. So that is a preparation for later.
I like how this plate about ownership of land leverages the full plot area available. Compared to the previous plot, a lot of margins are removed. I see this as a powerful way to emphasise the size of the lands.
Besides that, this week is a good week to show the impact that design makes on code.
Adding Du Bois’s style to the code, changes the code from this:
fig, ax = plt.subplots()
ax.barh(df.index.values, df['acres'])
ax.set_ylim(max(df.index.values)+1, -1)
To this:
fig, ax = plt.subplots(
figsize=(7.4,10.5),
facecolor=dubois_colors['bg']
)
rob_font_heavy = {'fontname':'Roboto Slab', 'fontweight': 'black'}
rob_font_light = {'fontname':'Roboto', 'fontweight': 'light'}
ax.barh(df.index.values, df['acres'], color=dubois_colors['crimson'], height=.55, alpha=.95)
ax.tick_params(left=False)
ax.patch.set_alpha(0)
ax.get_xaxis().set_visible(False)
ax.spines[['left', 'top', 'right', 'bottom']].set_visible(False)
ax.set_yticks(df.index.values, df['year'], fontsize=12, **rob_font_light)
ax.set_ylim(max(df.index.values)+1, -1)
plt.title('ACRES OF LAND OWNED BY NEGROES\nIN GEORGIA.', pad=-5, fontsize=18, **rob_font_heavy)
annotation_1 = int(df['acres'][0])
annotate_kwargs = {
'fontsize': 14,
'horizontalalignment': 'center',
'verticalalignment': 'center',
}
ax.annotate(format(annotation_1,','), (annotation_1/2, 0+.01), **annotate_kwargs, **rob_font_heavy)
last_index = len(df.index.values) - 1
annotation_2 = int(df['acres'][last_index])
ax.annotate(format(annotation_2,','), (annotation_2/2, last_index+.01), **annotate_kwargs, **rob_font_heavy)
plt.subplots_adjust(top=0.93, bottom=.00, left=0.08, right=1)
And the visual from this:
To this:
It’s again a special feeling to be handling Du Bois’s style in Python.
Challenge 02: Slave and Free Negroes (plate 12)
You can see the original here and my version below:
There is one thing that stands out to me about this challenge. The value of a well-developed sense of design. To help mimic the feeling a bit. Have a look at the graph below:
Now this is technically the same thing as the one that W.E.B. Du Bois made, give and take the names of the y-axis and a title.
There is a lot of things that Du Bois does to design his data:
- y axis labels
- x axis labels
- x axis scales
- title of plot
- title of x axis
- smart usage of a double y-axis
- colour pallet
- ripped paper effect on the left (which I recreated using a random data set)
- breathing room for the plot to be in
And there is probably a lot more that I don’t see.
Besides all that, there is one thing the original does that I haven’t implemented. In the final period, the percentage changes from 0.8% to 100%. Du Bois chooses to show an incline there, but with the x-axis range set to 3% – 0%, that should not be that visible.
That is what I like about this challenge. I get to experience some of the design decisions Du Bois and his team made. As a reminder of that, I keep the line uncorrected in my version.
1: Negro Population of Georgia by Counties, 1870, 1880 (plate 06)
You can view the original here and my version below:
Developing this was an interesting experience, as I haven’t work with maps a lot in Matplotlib. And I somehow got off on the wrong foot. I tried to load all the .csv files, instead of the very handy shape file…
When I develop my data visualisations, I often start by using system-default (in this case Matplotlib-default) colours. It allows me to focus on the functional setup of my code. The creators of the competition were kind enough to include a style guide. So when the code for the layout was set, I swapped the system colours with Du Bois colours. And that felt nice. It felt like applying the feel for style of Du Bois to my visual.
The GIF below rotates between the system colours and Du Bois colours.
Du Bois’s colour pallet improves it a lot right?
I’d also like to give you an insight into the technical setup. I’ve added an extra image below that shows the four elements of the overall plot:
As you can see, I use four overlapping subplots and make the background of the plots transparent. The legends in top right and bottom left plot are technically scatter plots.
That’s it for now. See you next week 🙂