Fantasy Football Punishment Meets Deepfakes

Despite my 10-year reign as commissioner extraordinaire in my family's fantasy football league, I've consistently found myself in a rather unfortunate predicament. I have never, not even once, emerged as the conqueror of this particular virtual gridiron. I've won other leagues with ruthless precision, and I've been tantalizingly close to tasting the sweet, sweet nectar of victory at home on three separate occasions, including a brutally devastating loss in the championship of the 2021 season. In that fated game, I weathered the absurd storm that was a five-touchdown extravaganza by Jamar Chase while still holding on, my hopes and points soaring above 170 with the help of Amon-Ra St. Brown. But alas, victory slipped through my fingers yet again thanks to a garbage touchdown by Najee Harris in the final seconds of Monday Night Football, the literal last minute of the entire season, and a move that actually hurt his team's chances of winning.

But that story, dear reader, is not why I opine to you today. Let's fast forward to my most recent catastrophe, a season that saw my team realize a very different, yet also unfamiliar achievement: dead last. Behold my roster of failure with Brandin Cooks, Chase Edmonds, Hunter Renfrow, Robert Woods, Trey Lance, and oh, the cherry on top, Javonte Williams. At one point in the season, I managed to field an entire starting lineup that did not contain a single player I had drafted, not one! It was as if my team had collectively decided to embark on a pilgrimage to the land of incompetence in my league of most critical importance.

You see, this league is not governed by eye-popping prize pools, but by pride. With a low $20 buy-in, we're not making headlines with $1000 winner-take-all pools. Instead, every year we come together from thousands of miles apart for an in-person draft, bestow the ever-growing trophy on the winner, and bask in the glory of fantasy camaraderie. This year our league really kicked it up a notch too. Ever committed to keeping spirits high, we decided it was time to up the ante with a punishment for the unfortunate toilet king. The loser must pick among three options:

24-hour breakfast - Spend 24 hours in a Waffle House or other breakfast establishment. The time requirement is reduced by 1 hour for each waffle consumed.
Black-tie butler - Dress in black tie at the in-person draft and serve fellow league members.
Music video - Record a music video and share it publicly with both the league and at least 1 external audience.

As the guardian of our league's traditions and inaugural recipient of the toilet king punishment, I knew I had to go big. With two young kids at home needing supervision that isn't at a Waffle House and the inexcusable option of letting last place affect a future season, the choice was clear. I decided to create a music video so epic, so outrageous, that it would deflect from the humiliation and just might make the football gods chuckle enough to grant me a little grace next season.

Overall, I spent more than a hundred hours and seven months of free time making my masterpiece. Many kid naps were spent singing in the closest, and perhaps a few hairs were ripped out as I navigated the landscape of deepfakes, but in the post that follows, I'll dive into the details of this journey, explore the tools that turned my lyrical fumbles into melodious touchdowns, and finally unveil my magnum opus on the public stage.

Making the Music

With my grand vision of creating a fusion of football and music fit for a fanatical fantasy family, I dove headfirst into music production. I come from a very musical family, was not very good vocally within the spectrum that I regularly heard at home, and so never considered myself to be a good singer until I stumbled into the rest of the world's attempts at "Happy Birthday." Now, let's be clear, I'm not a seasoned musician and I'm not winning any awards, but I can decently carry a tune when needed. I had some experience in high school recording covers with an electric guitar and electric drum kit, but that experience was ~15 years out of date by now. I hoped to figure out everything I needed along the way!

Selecting the Songs: Kid's Seal of Approval

Choosing the right song is a delicate balance. you want something you can sing with confidence, that shows off a bit off personality, and that you don't mind listening to about a thousand times. In my case, the selection process involved consulting with my most reliable musical source: my two-year-old son, Grayson.

My son has atypical, heavily-parent-influenced musical tastes for a toddler. A big '00s boy band fan, he loves Backstreet Boys, NSYNC, blink-182, and Taylor Swift, largely driven by how fun the music video was to watch. We also share a mutual admiration for the genius that is Weird Al Yankovic. My son loves hearing silly songs about Anakin Skywalker, Jar Jar, and Yoda (let's be honest I do too), while I can also appreciate how he practically drops a master class in music, comedy, and cultural references with every album he puts out. I also knew I wanted an arc to this saga that started with sorrow and concluded with optimism about the seasons ahead, which is hard to achieve with a single song and a repetitive chorus. Based on the football-driven needs and the fun we have together with parody in general (you should hear him sing "Mr. Water Bottle Guy" sometime, a song he wrote all on his own!), I knew I had to parody at least a few songs from my son's goto playlist.

The end result:

"I Want It That Way" by Backstreet Boys --> "I Lost on Draft Day," a woeful recognition of the rocky foundation upon which the losing team was built
"All the Small Things" by blink-182 --> "All the Losses," a desperate plea to end the suffering
"You'll Be Back" by Jonathan Groff and the cast of Hamilton --> "I'll Be Back", a hopeful look toward the brighter future of fantasy tomorrow

Writing the Lyrics: When AI Meets a Parody Quest

Writing lyrics for a fantasy football music video using AI was a mixed bag. On the one hand, it's absolutely mind-blowing that we're living in a time when I can ask a computer to "write a song about TOPIC" and within seconds get a decently rhymed, on-theme poem unique to whatever I imagine. At the same time, ChatGPT really struggled with understanding some key ingredients necessary for parody writing, namely syllables and the deeper themes of a song. This made it more like working with an eager, but slightly befuddled, student attempting to write a paper in a language they don't know. It served as a great sounding board and ideation partner, but nothing it generated was usable on its own.

For example, I asked ChatGPT to help me turn "It's Gonna Be Me" into "It's Never Been Me" in short sections, a reference to my failure to never clinch a chip, while preserving rhyme scheme and scansion. All it managed to do was repeat the same lyrics back. Loosening the restrictions on either requirement devolved to getting a brand new song that did not match the thematic elements of the original, did not align in the meter needed, nor rhyme in similar patterns.

Even when taking a step back to just understand syllable counts, ChatGPT struggled. Generating a list of fantasy-football-related words with a particular syllable count was futile.

In the end, I used ChatGPT as a great bank of short phrases related to fantasy football I could whip out, manually tweaked the phrases I wanted in each parody while preserving many core elements, and heavily used a good 'ol rhyming dictionary to help everything come together lyrically.

Recording the Vocals: My Closet Becomes a Studio

When it came to recording the vocals, I cheaped out. Thanks to the magic of YouTube, I already had instrumentals set for every song, so all I needed was the vocal tracks. Armed with my dusty $30 NEEWER USB Supercardioid Condenser Mic that I bought to seem fancy for appearing on podcasts to promote Optyx a few years ago, I was ready to belt out my best AJ McLean impersonation.

For my recording studio I selected the master closet. The acoustics were surprisingly decent, it was the closest to soundproof I'd get in the house in order to avoid waking the kiddos up during naps, and.....well, no, there's no "and," that's about all it took to make it the ideal haven for my aspiring Weird Al dreams.

In terms of equipment, my setup certainly could've used an upgrade. The boom arm on the mic was completely useless; it couldn't even hold the mic up, so I MacGyvered it with a bit of masking tape. For monitoring, I used the oh-so-state-of-the-art "one half of my headphones on and my other ear exposed" approach that I hear all the studios are using these days. For recording, I hilariously found myself relying on the voice memo recorder built-in to macOS. For some reason, no DAW I tried could actually record from the NEEWER USB mic without sounding like a glitchy robot. I recognized at that point that the hardware was crap, so I asked my musician cousin and legit famous A/V influencer celebrity for advice on a cheap mic, but his suggestion came in at around ~$200, so voice memo recorder workaround it was! 😅

All told, I churned out approximately 7 hours of vocals for 6 different songs, including harmonies and backing vocals, and ended up mixing my favorite three.

Mixing the Track: Where Harmonies Go Wild

Ah, mixing. where I traded the closet for a more familiar environment of my desk, keyboard, mouse, and monitor. Armed with Garageband, Audacity, and a sprinkle of optimism, I embarked on the quest to make my vocals sound like a chorus of 20-something men on the cusp of stardom in the early aughts, or at least a semi-coherent shower song.

Each song posed a unique challenge. With approximately 6-8 takes for each part, sorting through the recordings was a lot. For complex harmonies (looking at you, Backstreet Boys), I had almost 2 hours of audio to sift through for just a 3 minute song. Due to my hardware woes (my USB mic apparently likes to record as a whisper) the first step was turning up the volume to 11 in Audacity with a whopping 17 dB boost. I then turned my eyes to GarageBand where I sliced and diced every take, painstakingly labeling each part "lead - verse 1", "backing harmony - chorus". Then, I aligned it all, resulting in ~8 alternative lead vocal tracks, picked the best of each part, and sliced into a Frankenstein monster beauty of a take. When I didn't seem to have any viable candidates, or I missed the same note on every take, back I went to rerecord that section another week. The massive gap and irregularity of the setup between takes unfortunately resulted in some very audible sound differences in the final output, but hey I'm an amateur non-singer dad doing this during nights and naps, I'll take what I can get!

With a solid, consolidated track for each vocal part in hand, I set to work polishing the sound. Look, like I said, I can carry a tune OK, but I gained an immense appreciation for the relative difficulty of vocals between songs and the pain that comes with an entire track existing just at the edge of your comfortable vocal range. There's a reason broadway singers are acclaimed y'all. Some of those notes were hard AF! I tried my hand at some light manual pitch correction with Waves Tune to clean them up, but the process was about as tedious and embarrassing as cleaning up a diaper blowout with a toothbrush. The cost of the full version of Waves Tune was roughly the cost of hiring someone on Fiverr to do it all for me ($30 for the plugin, $10 per song on Fiverr), so I did the musical equivalent of kneeling in the redzone and outsourced this final stretch of polish. It was worth it; my vocals came back draped in EQ, adorned with celestial reverb, and sporting a newfound confidence in their pitch (checkout the comparison of all three version below). I probably could have skipped this for the songs I ended mixing as there weren't that many notes I strayed from, but it's definitely sounding a bit more polished, and I'm happy with the value.

With my tuned vocals in hand, I returned to Garageband and crafted my final mix. Was it grammy-winning? Maybe not. But it was a masterpiece; my masterpiece, crafted through hours of closet serenades, combing through cringey discarded takes, and a few strokes of lyrical genius. As I clicked that final button to publish my creation, I beamed. After all, this was just the beginning of my musical journey. Stay tuned (heh, couldn't resist 😉) for how the visual feast came together next!

Making the Video

Gearing Up: The Rig, The Software, The Task

Now that I had mixed my musical masterpiece, it was time to give it the visual treatment. The beauty of selecting parody is that we're not exactly short on source material. Every song has existing recordings or music videos just waiting for a face (or five) to be swapped out with mine! Here was the perfect task for AI.

Enter Faceswap, the open-source deepfake wizardry that promises to swap faces more convincingly than a politician during campaign season. Unfortunately, the PC with a real graphics card I acquired way back in college was north of a decade old at this point, hardly up to the task of modern AI model training. Pleased with any excuse to update my tech, I welcomed a shiny new rig to my office, complete with a GeForce RTX 4070 Ti, and much better equipped for this task. Hello, future, it's nice to meet you!

I attempted to get Faceswap running with my shiny new GPU through WSL (Windows Subsystem for Linux) to flex my script muscles and fully leverage my *NIX familiarity and at first it appeared to work. But alas, the universe had other plans. CUDA passthrough was as stable as my daughter learning to walk (more on that in a minute). Back to Windows and GitBash it was.

With hardware in hand and a working Faceswap environment, it was time to dive in. Faceswap isn't the easiest software to use and has a bit of a learning curve, but given what it's capable of automating, it is staggeringly accessible with excellent guides and a helpful forum to boot. Besides, I had some experience diving headfirst into countless face images for training AI models with my startup, Optyx, so, these concepts were already as familiar to me as the heat of a Texas summer.

At a high-level, the process works like this:

Extraction. We need thousands of images of the target face (mine, obviously) and the poor soul whose face is getting replaced (sorry, Tom DeLonge!). Variety is essential here across orientation, angles, lighting, and expressions, and extracting frames from videos are the way to get it!
Labeling - Once we've extracted raw faces from the videos, we need to wrangle all these images into something usable. Every image must be filtered for only valid faces and labeled with the appropriate person to whom it belongs.
Training - It's time to summon the neural network gods and create your very own face-swapping model. Each pair needs its very own model, if you're swapping out a whole band like the Backstreet Boys, say hello to 5x the work!
Alignments - The final steps of tedium, creating a list of every face, every facial feature, and their coordinates for every frame, for every face.
Swapping - With model and alignments in hand, we're at the finish line. One click, a virtual swirl of the wand, and voilà! We have our video.

Each step seemed straightforward enough, so with little planning and a smug bravado from years of manually labelling faces for Optyx, I dove right in.

Extraction: Gathering 1 Bazillion Faces

First up, face extraction. We need images of faces for our subjects. Lots and lots of them. Simple enough, right? If we were after any old photos smiling at the camera, most of us would have this ready to go, no problem. Sadly, our needs for a convincing swap are much greater. No posed smiles here, we needed a critical mixture of facial expressions, ugly frowns, wide open yells, closed eyes, blinking eyes, and everything in between, all across varied lighting conditions to make our swapping magic truly shine.

Thankfully, there is a pretty straightforward way to obtain this sort of information: pulling frames from the videos themselves! Videos are after all, just a series of images in rapid succession, and, unsurprisingly, videos are a goldmine for facial diversity and lighting variations (which is why we needed such variety in the first place for our swap 😉). Thanks to Faceswap's nifty utilities and the fact that all of my subjects are famous celebrities who have been recorded dozens of times, I was able to make short work of this step. An hour or two scouting YouTube for a few extra music videos and interviews by the same artists, a quick script to invoke Faceswap from the CLI, and bada-boom-bada-bing I had 30,000 images.

Things were a little different for me though. It's not like I have the same treasure trove of high-quality recordings of myself in various studio lighting setups while singing at the top of my lungs. To get my face data I had to get more creative. Instead, I turned on a mixture of different lights around the house, strutted around, belting out Backstreet Boys classics while recording a selfie as I continuously moved the phone around to capture different angles. My first round of this approach didn't end up producing enough variation, so I waited for a new time of day and collected a few more. After ~15 minutes of footage across a few different trials, I had what I needed. With myself and each of our targets filling up my hard drive already, we were good to go.

Labeling: Tidying Up the Raw Data

Now, here's where the fun train took a slight detour into the tunnel of tedium. Labeling. Oh, the labeling. If I had a dollar for every face I've labelled for an AI model in my life I could retire tomorrow. To be useful for our face swapping purposes, an image had to be, well, an actual face, properly oriented (no upside-down goofs, please), and correctly attributed to its respective celebrity.

Having struggled with tools like this in the past, I was actually super impressed by the simplicity yet effectiveness of what Faceswap had to offer here. With its built-in face embedding sort and variety of customization options, grouping similar faces and discarding the riff-raff was a piece of cake.

By the end, I had 10 neatly organized folders for the 9 celebrities and myself, each filled with several thousand images of our faces. If I hadn't spent so many hours attempting to manually rotate and realign the several thousand misidentified images (more on this when we get to alignments!), it actually would have been pretty quick and painless.

Training: Waiting, Waiting, and More Waiting

Armed with our meticulously curated faces, all that training requires is pointing Faceswap a specific pair of our folders of faces, choosing a model architecture, and clicking "Train". Well, and waiting, and waiting, and waiting, and then waiting some more.

Now, remember that WSL experiment I mentioned? Let's just say it was worse than watching paint dry, at least then you end up painting something! Despite the promises of GPU acceleration, claims of recognition by both nvidia-smi and Faceswap itself, our training progress was slower than moss on a Mississippi tree stump. A six-day training slog and 244kWh later yielded a result for a single face pair that could only be charitably described as "meh." Worried I had completely wasted my time and rig $$ thusfar, I switched to a proper Windows environment, and thankfully progress sprinted ahead. Overnight, our AI concocted a mesmerizing display of face-swapping prowess.

And thank god it could be achieved overnight, because this dance of model building needed to be repeated for each individual face pair, Brian <-> Patrick, AJ <-> Patrick, Nick <-> Patrick, and so on. With 9 models to build, I didn't just stand there clicking like a thumbass. I scripted the process, letting the AI work its magic while I embarked on a well-deserved vacation and returned to my cornucopia of models, ready to bring my fantasy football music video dreams to life.

Alignments: Finding Eyebrows Frame-by-Frame

If I thought labeling the faces was rough, oh boy, was I naive. I mean, seriously, "Oh Patrick-from-two-months ago, you poor innocent Faceswap virgin."

You see, in order to successfully swap faces, Faceswap uses an enigmatic artifact called an "alignments file." Think of it as a map that guides the AI to the exact location of each face in every frame of the video, complete with detailed descriptions of facial landmarks (you know, like 'eyes go here,' 'nose does a squirrely thing here,' 'mouth slightly open, but not like you're about to eat a sandwich, more like you're about to sneeze,' etc). It's a little like playing the world's most complicated game of "pin the mustache on the Backstreet Boy."

Now, normally on modern videos with large faces, Faceswap does an excellent job of using machine learning models to automatically create the alignments file. You just click a button and you're on your way, but for my 480i vintage 2000s ensemble music videos, the faces were too small to have landmarks created by the default, high quality aligner (this is why so many faces were rotated in my labelling step, if I had tried out alternative aligners I might have been able to skip that). Running through the less sophisticated aligner that works better on smaller faces helped enormously but still wasn't enough to get the job. Faceswap has a solution for this, but it uses the M word AI enthusiasts hate: "Manual Labor."

So, lucky me: hunched over my computer, squinting at an endless parade of video frames like some sort of desperate fantasy football loser clinging. You read that right, I spent weeks manually fixing each broken face and its landmarks on every. single. frame. in every single video, repeated for every. single. face.

My only saving grace is that my partner in crime who helped decide the use of these songs in the first place was by my side through it all, my son Grayson. A pint-sized hero prepared with an unwavering love for the Backstreet Boys and shocking patience for a toddler, he loved playing what he referred to as the "Backstreet Boys game." In 10-minute increments, Grayson and I would sift through frames of 'I Want It That Way,' carefully updating face landmark alignments along the way.

And after what felt like an eternity of meticulously placing rogue nostrils back where they belonged, the day of reckoning finally arrived. The face detections were corrected, the alignments were ready, and the stage was set for the most fantasy-football-related effort put into face-swapping the world had ever seen.

Swapping: Where the Magic Happens

With all the painstaking effort behind us, performing the actual swap is a piece of cake. All I had to do was point Faceswap to the original video file, the AI model for a face pair, and the meticulously corrected alignments. We press a button, wait a handful of minutes, and BAM! a beautifully swapped video emerged. In the case of multiple band members, I just had to write a quick script to process each face separately and build on the prior output (e.g. blink_original.mp4 -> blink_tom.mp4 -> blink_tom_mark.mp4 -> blink_tom_mark_travis.mp4).

While the process itself may have lacked dramatic flair, the impact was nothing short of miraculous. There my face was, across the entire band, in a hilarious display of Patricks.

Footballizing: Wrapping It Up

This finally concluded the AI portion of my journey, but I wasn't completely finished yet. Now I just had 5 Backstreet Boys singing in front of a place without a football in sight. For this I just needed so good ol' fashioned video editing chops. I took the swapped face video back into Premiere, plastered some jerseys on my players, replaced a few posters / signs, and away we were.

Final Result

So, there you have it, readers who have listened to me drone on for this long. The grueling saga of tuning vocals, aligning faces, and orchestrating the ultimate fantasy football parody video may have been fraught with obstacles and die-of-boredom labor, but it paved the way for a spectacle that I hope will live on in our league's history forever.