issanholovibrant.com

San Holo is one of my favourite artists. One of his projects is Stay Vibrant . He places a percentage sign in his name on twitter as a mood tracker. Inspired by sites like isitchristmas.com and, the more useful, kanikeenkortebroekaan.nl I wanted to make a website that reported whether he was feeling vibrant or not based on that percentage.

httpx and regex

I made the first version with httpx and the built in regex library. The script makes a GET request to the twitter page and extracted the percentage with a regex. It looked like this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


import re
import httpx

TWITTER = "https://mobile.twitter.com/sanholobeats"

percentage_regex = re.compile("(<title>[A-Za-z\s(]*)([0-9]*)")
html_percentage_regex = re.compile("PERCENTAGE")
html_yesno_regex = re.compile("YESNO")

res = httpx.get(TWITTER)

percentage = re.findall(percentage_regex, res.text)[0][1]
yesno = "yes" if int(percentage) > 54 else "no"

with open("template.html", "r") as template:
    with open("public/index.html", "w") as index:
        html = re.sub(html_percentage_regex, percentage, template.read())
        html = re.sub(html_yesno_regex, yesno, html)
        index.write(html)

As you can see that’s not a lot of code! I hosted this first version on gitlab pages and updated with gitlab ci. There were some problems with this setup, namely that gitlab pages was not reliable enough for me. It sometimes took timing out first before the site worked. When I got a VPS I moved it to that and updated the website with cron.

Twitter breaks GET

At some point twitter requires JavaScript to access the page and no longer sends rendered html when doing GET requests. I read on hackernews that it still worked when using the user-agent of the google bots. I added that to my script and it worked again.

1
2
3


HEADERS = {"user-agent": "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"}
# [...]
res = httpx.get(TWITTER, headers=HEADERS)

The user-agent stops working

Somewhere in the last two months the user-agents stops working too. Not to be outdone by twitter I wanted to find a fix. The main issue is needing javascript execution. I ended up running a headless browser with python-playwright . It’s insane that I need to run a whole browser to get a percentage from a webpage, and playwright installs three of them!

I couldn’t get it working how I wanted, I had to insert a flat five-second timeout for content to render. I might come back to it to do it properly when it inevitably breaks again.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32


import re
import asyncio

from playwright.async_api import async_playwright

TWITTER = "https://mobile.twitter.com/sanholobeats"

percentage_regex = re.compile(r"(<title>[A-Za-z\s(]*)([0-9]*)")
html_percentage_regex = re.compile("PERCENTAGE")
html_yesno_regex = re.compile("YESNO")


async def main():
    async with async_playwright() as playwright:
        browser = await playwright.firefox.launch()
        page = await browser.new_page(java_script_enabled=True)
        await page.goto(TWITTER)
        await page.wait_for_timeout(5000)
        content = await page.content()
        await browser.close()
        percentage = re.findall(percentage_regex, content)[0][1]
        yesno = "yes" if int(percentage) >= 50 else "no"

        with open("template.html", "r") as template:
            with open("public/index.html", "w") as index:
                html = re.sub(html_percentage_regex, percentage, template.read())
                html = re.sub(html_yesno_regex, yesno, html)
                index.write(html)


if __name__ == '__main__':
    asyncio.run(main())

As you can see, the script has gotten hackier with time. At this point I see it as a challenge to keep this script working no matter what. Please visit issanholovibrant.com to see how San Holo is doing, and don’t forget to stay vibrant ⬆✨.