Hatnote and Wikipedia Projects

Every day I wake up happy that one of my favorite websites, Wikipedia, is free, open, and well-supported. My wiki-timeline summarized:

Today, I am still a student of Wikipedia. I use it dozens of times a day, and you'll see it cited throughout anything I write. I also continue to proudly host and administrate MediaWiki, though my extension-writing skills have languished. But this is mostly a story about building on Wikipedia: the story of Hatnote.

Contents

Origins of Hatnote

Hatnote is a volunteer-run design studio, organized around Wikipedia as a social and data platform. In other words, we are dedicated to exploring new perspectives on wiki life.

Hatnote grew out of a 2012 WMF hackathon. I was having a great time building and teaching JavaScript and Python, but I fell upon a disturbing realization. The Wikimedia Foundation, proprietor of my favorite website, situated in Silicon Valley, epicenter of technical knowledge, was not teeming with thousands of engineers and data scientists coming out of the woodwork to defend and support free knowledge and culture as I had expected.

I realized, while everyone I met from the foundation was talented and well-intentioned, they simply did not have the resources to push Wikipedia's innovation envelope as far as it could go. This isn't meant as a controversy-inducing criticism of the WMF; Wikipedia and other Wikimedia projects have always relied much more on the community for regulation and development. To this point, look at the numbers:

Compare this to other top 100 websites worldwide. I hate to put it in such economic terms, but Wikipedia's utility per capita, even counting community members, is off the charts. Besides, when it comes to innovation, education, and progress, is there even such a thing as "enough"?

The reality is that the WMF is a nonprofit formed to steward these sites. They keep the servers up, keep the sites usable, and keep it all above board legally and financially. They're rightfully more focused on increasing accessibility, with campaigns and features like Wikipedia Zero and VisualEditor. Take all of that, add in community organization of chapters and various events, and it's not so surprising that Wikipedia doesn't keep up flashy appearances next to for-profit Silicon Valley neighbors.

So, Stephen and I formed Hatnote to do what we could to promote Wikipedia among the Internet's established power users. To add new types of interaction for new generations of Wikipedia users, to help people remember that Wikipedia is more than just the first, best result on every search site, and to keep it all free.

Hatnote Projects

A list of Hatnote projects, in reverse chronological order. Projects not listed here can be found either on our alpha test page, Hatnote's GitHub, or Stephen's GitHub.


title: PaceTrack gh_link: https://github.com/hatnote/pacetrack/ project_link: https://tools-static.wmflabs.org/pacetrack/ description: |

Track and publicize Wikipedia Improvement Drives: targeted, time-limited efforts to improve a set of articles. Combines the best parts of editathons and WikiProjects with elements of crowdfunding. Started at WikiCite 2018 in collaboration with Pete Forsyth.


title: Monumental gh_link: https://github.com/hatnote/monumental project_link: https://tools.wmflabs.org/monumental/ description: |

Monumental is Hatnote's most recent project, building on our interest in preserving and documenting heritage sites around the world. Drawing upon Wikidata, Wikipedia, and Wikipedia Commons, Monumental displays information and media about cultural heritage for a general area or specific monument.


title: Data Waltz project_link: https://woodbury.edu/event/data-waltz-wuho/ description: |

While Hatnote has inspired and supported some amazing installations in the past, like this one from the NCSU library, this collaboration was the deepest we've been involved with a physical installation, this time at the WUHO Gallery in Hollywood.

Using Wikimedia's new EventStreams API, several Arduinos, half a dozens speakers, and thousands of LEDs, we helped create an immersive experience of Wikipedia editing. Geolocated edits are represented with lights and effects, with edits taking place in the vicinity of the gallery triggering a shimmering display. Thus, visitors were encouraged to make what was often their first contributions to the world's largest repository of free knowledge.

Listen to the sounds and see the lights in the 360° video below (click and drag to look around):

Data Waltz was the featured use case in Wikimedia's official EventStreams announcement.


title: Montage gh_link: https://github.com/hatnote/montage project_link: https://montage.toolforge.org/ description: |

Montage is a judging tool used to judge the hundreds of thousands of submissions to the annual Wiki Loves Monuments photography competition, also known as the largest photo contest in the world.

WLM is truly unique a unique event, and to judge it in a suitably open and wiki-compatible way required bespoke software, engineered from the ground up for the specific use case.

Montage is our most advanced project yet, and we wrote all about the background, process, features, and future of the project on Wikimedia's official blog. In short, Montage has been a tremendous hit among its users, and has expanded to other media contests, such as Wiki Loves Earth, Wiki Loves Folk, and more!


title: Wikipedia Top 100 gh_link: https://github.com/hatnote/top project_link: http://top.hatnote.com description: |

Most Hatnote projects revolve around editing and other interactive Wikipedia activities. With top.hatnote.com, we turned that around and sought to offer clean and simple insight into the reading habits of Wikipedia's biggest user group: its readers.

Updated daily, the Top 100 is a chart of the most-visited articles on Wikipedia. Nearly 20 billion times per month, around 500 million people read articles in over 200 languages. The Top 100’s daily statistics offer a window into where Wikipedia readers are focusing their attention. It also makes for a great way to discover great chapters of Wikipedia one wouldn’t normally read or edit.

Clear ordering, images, sparklines, and approachable statistics make data approachable for casual readers. Structured data feeds, including JSON and RSS, keep the site relevant for developers and power users. Socialites of all skill levels share discoveries via Twitter button integrations on individual tiles, or automatically through an IFTTT recipe based on the RSS feed.

Personally, writing this a couple months after launch, I still visit top.hatnote.com first thing in the morning, often before I get out of bed.


title: Wikipedia IFTTT channel gh_link: https://github.com/slaporte/ifttt project_link: https://ifttt.com/wikipedia description: |

IFTTT (IF This Then That) is a web service for connecting and automating the sites that make up our online ecosystem. Wikipedia didn't have a channel, so with a bit of help from some friends, we built one.

My feelings toward IFTTT are mixed, but because the walls of the Internet corporate gardens keep growing taller, I do use it. And if sites like Facebook and Buzzfeed have a channel, then Wikipedia deserves one, too.

Last I checked there are tens of thousands of daily users on the Wikipedia IFTTT channel, resulting in millions of hits to the web application that services user Recipes. That application runs on Wikimedia Labs and the code is on GitHub.

If you're intrigued and would like to give it a spin, I've written up two guides:

We hit some decent milestones. Over a million requests per day and IFTTT's Top Chef #89 ain't bad:


title: Wikipedia Social Search gh_link: https://github.com/hatnote/hashtag-search project_link: http://tools.wmflabs.org/hashtags/ description: |

Wikipedia's community is unlike any other online. Something about the system's radical user inclusionism, combined with a mission to realize the original intent of the Internet, an interconnected knowledgebase that anyone can edit, has attracted people from all walks and corners.

But even unique communities expectations will evolve as people join from surrounding communities. Toward that end, Wikipedia Social Search adds some familiar functions back to Wikipedia: #hashtags and @mentions. Now if you make an edit to Wikipedia with hashtags or mentions, we parse it out and index it (with this batch job). Perfect for editathons and tracking ad hoc organized editing. This feature also makes an appearance in the IFTTT channel, with the hashtags trigger.


title: The Weeklypedia gh_link: https://github.com/hatnote/weeklypedia project_link: http://weekly.hatnote.com description: |

As much as one might like Wikipedia, it moves so quickly that it can be hard to track when major editing events occur. Email digests are a common solution to this problem, and are more relevant than ever. Many social networks have email to thank for retaining active users, who might otherwise forget they have an account (looking at you LinkedIn and Twitter).

The Weeklypedia is an aptly-named weekly summary of the most edited Wikipedia articles, available in 15+ languages. Skimming an issue only takes a couple minutes and can yield surprising results. The data used to generate The Weeklypedia is also available. Monitoring is achieved with cronfed.


title: Listen to Wikipedia Mobile App project_link: https://itunes.apple.com/us/app/listen-to-wikipedia/id832934300 description: |

Listen to Wikipedia is Hatnote's most popular project. And while the mobile site worked on the phones we tested, the sheer number of user emails and messages we got prove that native apps have a certain je ne sais quoi for some people.

Bryan Oltman crafted the wonderful iOS app. I may have helped a little, but I probably got in the way as often as I contributed.

I played around with the beginnings of an Android app, written in Kivy. You can find that experiment on GitHub.


title: See, Also gh_link: https://github.com/hatnote/seealso project_link: http://seealso.org/ description: |

A play the common "See Also" heading in Wikipedia articles, See, Also is a virtual gallery of Wikipedia-derived interactions and visualizations. After the success of Recent Changes Map and Listen to Wikipedia, Stephen and I figured we should leverage some of that Hatnote fame to get exposure for other wiki-based projects that we found inspiring. The architecture of See, Also partially inspired chert, the application that renders this page.


title: Listen to Wikipedia gh_link: https://github.com/hatnote/listen-to-wikipedia/ project_link: http://listen.hatnote.com/ description: |

"Strangely melodic" and "oddly mesmerizing". If you haven't seen and heard it yet, Listen to Wikipedia is a real-time auralization of Wikipedia growing, one edit at a time.

The site is literally self-explanatory. With around 2 million unique users since 2013, it's been a joy to build and run. In addition to extensive news and blog coverage, Stephen and I have made appearances everywhere from major media outlets like NPR, BBC radio, and French TV to the halls and walls of libraries, museums. It also won a Kantar Information is Beautiful award, in the Interactive Visualization category, and was used to conduct multiple yoga and meditation sessions.

Listen to Wikipedia has remarkable staying power. It has tens of thousands of regular monthly users, and new people discover it every day. To accomodate that, L2W has an uptime over 99% that of its upstream services. It gets its data from Hatnote's websocket streaming from Wikimon.


title: Recent Changes Map gh_link: https://github.com/hatnote/rcmap project_link: http://rcmap.hatnote.com/ description: |

Recent Changes Map is a real-time visualization of Wikipedia edits by their city of origin.

Around 10% of edits to Wikipedia are made by unregistered users. No other major site so faithfully puts trust in humanity to build and rebuild more often than destroy. Millions of articles later, Wikipedia stands as a testament.

Registered Wikipedia users are only known by their user alias, so Recent Changes Map uses the IPs of these anonymous users to establish an approximate location. The results amazed. So many unlikely pairings. South American interest in American Idol, North American interest in Japanese animation and wrestling, Commonwealth countries interest in each other, and much less predictable results amazed the Internet and generated quite a buzz.

Like Listen to Wikipedia, the Hatnote Recent Changes Map is built on our websocket stream from Wikimon.


title: WOMP gh_link: https://github.com/mahmoud/womp description: |

Wikipedia Open Metrics Platform, or WOMP, was created as a console to fetch, extract, and organize data, using Wapiti, Hatnote's Wikipedia API client. The idea was to build an application you didn't have to be a programmer to use. WOMP was created for and inspired by Adrianne Wadewitz's research into Wikipedia's community dynamics. Development took a break when her data was fetched, and with her passing, is on indefinite hiatus. I hope I get back to working on it someday.

WOMP created the necessity that led to two open-source projects of mine, Lithoxyl and Strata.


title: Wapiti gh_link: https://github.com/mahmoud/wapiti description: |

Wikipedia's querying API is one of the richest and most complex available. And what Wikipedia lacks in semantic content, it buries even further with complicated and inconsistent access patterns.

Wapiti is an experimental client which rationalizes these functional-if-confusing APIs into a Python interface with an highly consistent and recombinable API. Wapiti mostly works, but has been in the backseat for a while due to more pressing projects.


title: Disambiguity gh_link: https://github.com/hatnote/Disambiguity description: |

Wikipedia grows quickly, almost 0.1% per week. With 700 new articles per day, older pages can barely keep up. One way Wikipedia experiences growing pains is this:

Let's say there's a "Mars" article, talking about the planet, and all the astronomy articles link there. When Wikipedia's definition of "Mars" grows to encompass the Roman god and chocolate bar, the "Mars" is replaced with a "disambiguation" page, like this one. Now it might not be clear from reading the article linking to "Mars" which Mars is intended. Imagine that problem, but in the context of names like "John Smith" and so forth.

This is a hard problem for Wikipedians, and we decided to tackle it by gamification1. Suffice to say, Disambiguity was a very challenging, very fun, and very niche game to both play and build. It was featured at Wikimedia's 2012 Maker Faire booth. Stephen has the story, in photos.


  1. We were young! It was 2012! 


title: Qualityvis gh_link: https://github.com/mahmoud/qualityvis description: |

The project that started it all. Originally proposed and implemented in a two-day Wikimedia Foundation hackathon, the creatively named Qualityvis aimed to solve one of the hardest problems on Wikipedia: finding something to edit.

Originally based on hand-picked heuristics, Qualityvis grew into a full-scale Big Data + machine learning project. We extracted hundreds of dimensions for hundreds of thousands of pages and revisions to establish a baseline quality evolution gradient. We looked at:

* **Internal factors**, such as page structure, media content, and citations.
* **Interpage factors**, like categorization, linkage, and template usage.
* **External dimensions**, including Google search and news results.

We eventually achieved an 85% success rate at identifying Featured articles. More importantly, we did so with a method, built on MARS, which, unlike neural networks and other opaque solvers, retained explainability of the model. That meant we could direct users to a prioritized set of actions that would increase article quality. The technique also resulted in compact and portable models which could be easily inspected and ported for use in other stacks, like frontend JavaScript.

Qualityvis won us second place at the WMF hackathon, and ended up on indefinite hiatus, as we stalwartly marched into the bog of automation and repeatability, before being swept into wide-appeal projects like Listen to Wikipedia and RCMap.

Qualityvis taught me a lot about machine learning, the quality of open source (especially Node.js and certain unnamed Python libraries), and project management.


People

Hatnote has a lot of projects under its collective belt and it's taken a lot of work from a lot of people. Hatnote credits roll:

I hope I'm not missing anyone. Going on three years and a dozen projects, keeping track is tough! But thanks to everyone who's ever helped, whether through code, filing issues, or simple promotion. I really appreciate it.