Skip to main content

g guillaume paumier

Wizards, Metadata, and Memory

Unearthing and Reconstructing UploadWizard’s Lost History

Stage design of the Enchanted Garden by Hermann Burghart for the opera Merlin, featuring dramatic arches, misty foliage, and magical lighting in a painterly, theatrical style.

Stage design of the Enchanted Garden by Hermann Burghart for the 1886 premiere of Karl Goldmark’s opera Merlin. (Herman Burghart on Wikimedia Commons // Public domain)

One in three pictures you see on Wikipedia was added through UploadWizard, a tool I designed in 2009. Over the past 15 years, 1.8 million unique volunteers have uploaded over 42 million files with UploadWizard to Commons, Wikipedia’s media library. But unearthing those metrics turned out to be more complicated than I expected. What began as a straightforward question became a journey through over a decade of evolving logging practices, overwritten traces, and quietly deleted markers. The road of metadata archaeology is wild and wicked, winding through the wood. Follow me, my friend, to glory at the end.

The Tower, Reversed

UploadWizard has been used to upload tens of millions of files to Wikimedia Commons. It was designed to simplify the contribution process, especially for newcomers. I was both the designer and product manager of UploadWizard, and it was the first project I worked on after joining the Wikimedia Foundation in 2009. At the time, I was also a very active volunteer contributor on Commons, which made the project feel deeply personal. In many ways, UploadWizard felt like my “baby.”

And yet, more than a decade after its introduction, one simple question turned out to be surprisingly hard to answer: how many files have been uploaded with UploadWizard?

In any corporate tech environment, this basic key performance indicator (KPI) would be displayed on an easily accessible metrics dashboard. But Wikimedia isn’t a typical tech company. It started as a small, scrappy nonprofit, and it’s always kept as little user data as possible to protect the privacy of its readers and contributors. And sometimes, the volunteer nature of Wikipedia and its sister sites makes it even more difficult to follow basic practices to measure a product’s impact.

For you see, when it comes to UploadWizard, there is no central log, no consistent tag, no definitive metric. The metadata is incomplete and inconsistent. In a movement devoted to free knowledge and historical preservation, the impact of one of its most important tools has been left largely undocumented.

During the initial development of UploadWizard, we had included a tracking category called [[Category:Uploaded with UploadWizard]] as a basic product KPI. And for a while, it worked: this maintenance category, hidden from view, populated steadily with uploads from the new tool.

But years later, the volunteer community decided that the tracking category was unnecessary. I had managed to save it in 2012,[1] but in 2016 it was deleted,[2] and the software changed to no longer add it automatically (gerrit:315121), thus leaving no structured marker. UploadWizard was victim of its own success: volunteers deemed the category purposeless, because “UploadWizard is the default mode of upload.” One volunteer likened the UploadWizard tracking category to “having an ‘Articles created using the Edit Button’ category on Wikipedia.”

Recently, I set out to find other ways to measure the uploads. It wasn’t just a matter of Wikimedian curiosity; it was about acknowledging an accomplishment and tracing the impact of work I had poured so much of myself into.

The metadata hadn’t crumbled all at once. It had eroded quietly, over years of changes, edits, and well-meaning decisions. And yet, the traces were still there, winding out of time, if you knew how to look for them. This is the archaeological story of how I tried to reconstruct that lost history; not just by counting files, but by piecing together evidence across shifting metadata, deleted categories, forgotten patches, and overlapping logging mechanisms.

We’re Off to See the Wizard

The question seemed deceptively simple: how many files had been uploaded using UploadWizard? At least 9 million as of August 2016, according to one comment in the community discussion that had led to the deletion of the category.[3] That was a start.

The tracking category had given us a built-in count, accessible to anyone and displayed prominently on the category’s page. It was no longer available, but I knew of another way to identify the wizard’s uploads: the files would all have the same “logging comment” for the upload, recorded as “User created page with UploadWizard.” It would take a few database queries, but it was still easy enough. Or so I thought.

Screenshot of the file history section on Wikimedia Commons for the image “Tour Hertzienne de Mesnil-Esnard,” showing that the file was uploaded by user Guillom on 10 February 2014 with the comment “User created page with UploadWizard.”

Screenshot of the file history for one of my uploads to Wikimedia Commons. The log entry shows the automated comment “User created page with UploadWizard.”

The Quarry tool makes it possible to run queries on a copy of Commons’ database. In October 2016, Steinsplitter used the log comment method on Quarry and counted 10,228,537 uploads (query/13031).

Screenshot of an SQL query run on the Wikimedia Quarry platform on October 10, 2016. The query counts the number of upload log entries with the comment "User created page with UploadWizard" and returns a result of 10,228,537.

Screenshot of query/13031 run by Steinsplitter in October 2016 to count UploadWizard uploads using the log comment “User created page with UploadWizard.” At the time, this query returned just over 10 million files. However, as I later discovered, this method excluded early uploads (before structured logging began in 2012).

This was a replicable, simple enough method. In October 2020, I used a similar approach to count UploadWizard files, although I had to adapt it due to changes in the structure of the database. My friend Reedy double-checked with his own query, and we counted 20,231,414 files. The number was growing over the years, which made sense.

But when I ran the same query again in March 2025, the result was almost exactly the same as in 2020: 20,231,573 files, so I knew something was wrong (query/42024).

Down, Down, Down the Road, Down the Wizard’s Road

After some digging, I found out that Wikimedia’s Multimedia team had changed the log comment in 2020 and replaced it with two possible patterns: “Uploaded own work with UploadWizard” for volunteers uploading their own pictures, and a more complex one for for uploads of works by others: “Uploaded a work by $1 from $2 with UploadWizard” where $1 is the copyright holder, and $2 the source.[4] The former used a fixed format and could be queried easily. The latter, though, would vary from upload to upload.

I looked up the identifier of the own-work pattern (query/91991) and queried the database, which returned 14,118,636 files (query/42025).

To look into the second pattern for uploads of third-party works, I had to use a regular expression: a pattern of text that matched comment uploads independently of each upload’s details (query/92166). The query yielded 4,436,472 results, bringing the total to a minimum of 38,836,118 files uploaded with UploadWizard.

I was pretty happy with myself by that point, and decided to tabulate the results by year and month to visualize the growth over time. That’s when I realized that the story was more complex, but it led me to eventually discover change tags.

Through Many Miles of Tricks and Trials

When I broke down the uploads by year for the original log comment (query/92164), I noticed that no files were listed for 2010 or 2011. (There were also some anomalies uploaded after 2020, which I investigated later.) Further research indicated that the original log comment replaced in 2020 had only been introduced in 2012 (gerrit:9714), leaving out two years of uploads.

Checking some of my own uploads from that period (December 2010, March 2012), I confirmed that they had no log comment or initial edit summary. Their page history did reassure me that I had uploaded them with UploadWizard, since the tracking category had been removed in 2016 (permalink/222387780), when the community had decided to get rid of it (that same unfortunate decision that had started me on this whole wikiarchaeology expedition in the first place).

I realized that the category removal might give me an indirect way to identify early uploads between 2010 and 2012: they would contain an entry in their edit history with the mention “Category:Uploaded with UploadWizard removed per community decision.” I just needed to make sure I excluded files already counted using the log comment method.

I got the associated identifier (query/92177, query/92193) and counted an additional 529,936 files uploaded with UploadWizard in 2010−2012, including some false positives that I ruled out later (query/92202).

Winding Out of Time

Through my digging, I came across a few related tickets in Phabricator, Wikimedia’s platform for tracking feature requests and bugs. One of them was a request to “Use an informative, custom edit summary for every file uploaded with UploadWizard” (T142687), which gave me a scare because it would have made it impossible to do any sort of counting in the future. But I also found a request to “Mark UploadWizard uploads with a change tag” (T121872).

In MediaWiki, change tags are annotations for certain types of edits, for example if it was made with the visual editor, or if it reverted the content to a previous version.[5] Matthias Mullie had added an uploadwizard tag to the software in May 2017 (gerrit:337566), as well as an uploadwizard-flickr tag for files from flickr. Unfortunately, the tags would only be applied to future uploads. Still, the tags provided a new, easy, and reliable method for counting post-2017 uploads, especially those after 2020 when the log message was split into two patterns.

In the end, I had identified five methods for counting uploads: 1. the removal of the original category, 2. the original log comment, 3. the two change tags, 4. the log comment for own works, and 5. the log comment for third-party works. I could safely ignore the latter two, but the remaining three methods still overlapped over many years, so I needed to figure out exact timestamps and boundaries to avoid double-counting.

A timeline diagram showing horizontal bands with the different detection methods, including the now-undeeded split log comment starting in 2020. The three other metadata markers overlap by several years.

Timeline of the five distinct detection methods used to identify uploads made with UploadWizard over its lifetime, illustrating the fragmented nature of its historical metadata. Each method corresponds to a different metadata marker introduced at different stages: removal of the original category (2010−2016), original log comment (2012−2020), the uploadwizard change tag (2017 onward), and the now-undeeded log comments introduced in 2020.

A few more queries later, I had identified the timestamps for the first upload to use the log comment in 2012 (2012-08-23T20:33:03Z, query/92207), and for the first one to use the change tag (2017-05-10T19:47:57Z, query/92206). The early uploads from 2010−2012 were trickier because the removal of the category was a more fragile detection method.

I went looking through the archives of the Server admin log, which documents software deployments and other system operations in the Wikimedia infrastructure. An entry by Roan Kattouw indicated that UploadWizard had been deployed to Commons on November 30, 2010 at 11:29. [6]

This gave me a strict boundary and it narrowed down the search for the first file uploaded with UploadWizard. I looked for pages created that day that were later edited to remove the UploadWizard category (query/92267), and found a photo of the TV Tower of East Berlin uploaded shortly after deployment by Neil Kandalgaonkar, the lead developer of UploadWizard. It is likely that Neil uploaded this file both as an initial test to verify that the tool had been successfully enabled on Commons, and as a fitting inauguration, making it a historically significant first use of the feature.

Pulling back the curtain

And so, at last, I had all the ingredients for my spell: I had three detection methods, each clearly bounded to avoid false positives and double-counting.

A stylized timeline showing three colored segments representing different metadata markers used to identify UploadWizard uploads, with vertical lines marking the exact timestamp boundaries between each period.

This timeline shows the precise the start and end dates for each metadata marker used to detect UploadWizard uploads, making it possible to measure usage across its full history without overlap.

Once I had assembled a methodology and carved out clean timestamp boundaries for each detection method, I was finally able to begin extracting numbers, and the stories they told.

As of April 21, 2025, 1,820,907 unique volunteers have uploaded a total of 42,596,080 media files to Commons with UploadWizard (query/92995, query/92994). The monthly breakdown in the following chart shows the growth rate over the past 15 years, as well as the yearly spikes corresponding to contribution campaigns and global contests like Wiki Loves Monuments (in September−October each year).

A vertical bar chart titled “Monthly UploadWizard Uploads (2010–2025)” showing total uploads per month. The x-axis labels show only January of each year from 2011 to 2025. The y-axis ranges from 0 to 600,000 uploads. Notable annual spikes appear around September each year, reflecting seasonal campaign activity.

Monthly uploads to Wikimedia Commons using UploadWizard, from its launch in November 2010 through April 2025. The chart shows strong annual cycles, with peaks around September, coinciding with Wiki Loves Monuments. The gradual growth over time reflects UploadWizard’s role as the primary contribution tool for media files.

To better understand contributor behavior, I ran a query to group UploadWizard users into buckets based on how many files they had uploaded over time (query/92997). The engagement distribution reveals a classic long-tail pattern: of the 1.8 million volunteers who used UploadWizard, nearly half uploaded only a single file, and another 40% contributed fewer than ten. These numbers are consistent with Commons’ role as an open platform, where many users participate sporadically, often to share a single image of personal or local relevance. These numbers were evidence of a tool doing the work it was designed to do: helping people contribute freely licensed media to the world.

The horizontal bar chart titled "UploadWizard Contributors by Number of Uploads" visualizes the distribution of users based on how many files they uploaded using the tool. It shows that 891,614 users uploaded just one file, while 736,928 users uploaded between 2 and 10 files. Another 165,996 users uploaded between 11 and 100 files, and 22,068 users uploaded between 101 and 1,000 files. At the highest end of the spectrum, 4,302 users uploaded over a thousand files each.

Distribution of UploadWizard contributors by number of uploads. While the tool lowers barriers for newcomers (over half the users uploaded only once), it’s also used by dedicated contributors: more than 4,300 users have uploaded over a thousand files each, highlighting the tool’s long-term utility and wide adoption.

But another story also lies in the deeper tiers: over 22,000 contributors uploaded between 101 and 1,000 files, and more than 4,300 users crossed the 1,000-file threshold. These power contributors (just 0.2% of all uploaders) account for a disproportionate share of Commons’ visual knowledge. Their sustained participation underscores that UploadWizard isn’t just a tool for newcomers. This highlights the importance of balancing ease of use with the advanced needs of experienced users. Designing for both ends of that spectrum is key to growing and sustaining Commons’ media ecosystem.

No One Mourns the Wizard’s metadata

Looking back, the amount of effort it took to reconstruct the history of UploadWizard’s usage is perhaps the most ironic aspect of this excavation. Wikimedia is a movement obsessed with preservation: we document every edit, every template, every discussion. We track every page’s revision history in minute detail. And yet, the historical record of one of the most significant tools used to contribute content to Commons was never formally maintained.

That’s not to say it was malicious, or even careless. It was simply a mismatch of priorities. The category was seen by volunteers as clutter and removed, a reasonable decision made in good faith. But from a product perspective, such decisions can carry unintended consequences, like the loss of institutional memory.

Today, measuring the impact of a new tool is much more straightforward: improvements to the platform, like change tags, make measurement easier, and Wikimedia now has full-time product analytics staff involved at every step of the development process. A decade ago, categories and log comments were all we had. Tools like UploadWizard are still in use and central to the contributing experience, but measuring their impact takes more determination. Or, as a mustached orange fluffball would say, someone who cares a whole awful lot.

Like Merlin in The Once and Future King, I found myself living backwards in time,[7] remembering what the system once knew, even as its present structure forgot. Querying the past through metadata felt less like analysis and more like reconstruction: following traces not because they were meant to be followed, but because they hadn’t yet disappeared.

This fragility, the slow disappearance of signals, is why this work felt more like archaeology than analysis. I wasn’t pulling data from a dashboard; I was excavating buried layers, hoping that enough of the traces remained to reconstruct a timeline.

The Wizard and I

The last time I tried to count UploadWizard uploads was in 2020. Back then, I used the log comment method and came up with about 20 million files, which already felt staggering. What I didn’t realize at the time was that this method missed the first two years of uploads entirely, and that the logging pattern was just about to change.

This time, I discovered twice that amount: a full third of all files on Commons. When I first moved to San Francisco and began working on UploadWizard, I couldn’t have imagined that the tool would still be in use 15 years later, largely unchanged, and that the numbers would be so vast. It’s humbling, a little surreal, and deeply gratifying.

What stayed with me most throughout this archaeological expedition wasn’t just the technical puzzle; it was the reassembly, the emotional arc of uncovering it. The joy of finding Neil’s first test upload. The frustration of queries that almost worked. The satisfaction of unlocking the Grimmerie and watching the story piece itself back together, one log comment and patch note at a time.

I didn’t initially set out to revisit the most important project of my early career; I just wanted to answer a simple question. But as I sifted through missing metadata and fading fragments, I found myself face-to-face with something much more personal: the enduring presence of a tool I helped bring into the world.

This exploration wasn’t merely about product analytics and KPIs. It was an act of stitching Commons’ memory back together (and in the process, restitching my own) through the quiet, persistent work of following traces others had left behind. It was about memory, continuity, and the fragile threads that hold institutional knowledge together. I set out to measure a tool’s impact; through some magic, I ended up unearthing my own legacy.