HTML Sanitizer

Languages

The HTML Sanitizer API lets browsers safely turn untrusted HTML strings into DOM content. In interviews, it is often more useful to practice the core traversal and transformation ideas behind that API than to reproduce the full platform surface.

Implement sanitizeHTML(input), a simplified sanitizer inspired by the platform's safe HTML sanitization behavior.

Parse the HTML into a detached DOM tree, sanitize it, and return the resulting HTML string.

The sanitizer should:

  • Remove these element subtrees entirely: script, iframe, object, embed.
  • Remove all HTML comment nodes.
  • Remove any attribute whose name starts with on.
  • Remove href and src attributes whose trimmed, case-insensitive value starts with javascript:.
  • Preserve all other parsed HTML.

Examples

sanitizeHTML('<p>Hello <strong>world</strong></p>');
// '<p>Hello <strong>world</strong></p>'
sanitizeHTML(`
<div>
<!-- secret -->
<a href=" javascript:alert(1) " onclick="evil()">Click me</a>
<script>alert(1)</script>
</div>
`);
// '<div><a>Click me</a></div>'

Arguments

sanitizeHTML(input)

ArgumentTypeDescription
inputstringThe HTML string to sanitize.

Returns

Returns a sanitized HTML string.

Notes

  • This question is intentionally scoped to HTML in a normal browser DOM. You do not need SVG, MathML, CSS sanitization, Trusted Types, namespace-heavy parsing, or spec-perfect URL normalization.

Resources

Loading editor