Post 17 - Seeking validation

17 Dec 2022

Task 22, Day 17, `Secure Coding` Filtering for Order Among Chaos

Validating input is the castle wall of your web aap castle. Merits the same amount of hardening as their medieval counterparts. A generally effective way of validating input is first to know how a specifi piece of data is going to be processed by the rest of the app. Then go through syntax and semantic validation checks to ensure user-provided values in syntax and logical value. Then comes filtering with very specific rules as to what forms will accept and imeediately strip or drop ones that don’t fit the predefined category.

HTML5 has <`input> element with array of helpful capabilities centered on validation. I.e.:

type - can be set to specifically filter for email, URL, file, text, etc., checks user-provided input fits criteria of form, and reports validity bacl to user immediately.

Regex can also be incorporated for more granular control. Add regex to the “pattern” attribute of <`input> element. I.e.: <`input type="text" id="uname" name="uname" pattern="[a-zA-Z0-9]+">` <`input type="email" id="email" name="email" pattern=".+@server\.tld">

Regex 101

Square brackets indicate trying to match one character within the set of characters inside of them.
[a-z] match any lowercase letter in English alphabet
[aeiou] match any vowel in English, order doesn’t matter in set [a-zA-Z] match letter in English, case insensitive
[a-z0-9] match any lowercase alphanumeric character
. wildcard operator matches any character, quite powerful with *,+, and {min,max}
* used if you don’t care if preceding token matches anything or not
+ used to make sure it matches at least once
{min,max} specifies number of characters to match
^ and $ anchors denoting start and stop of string, respectively
[a-zA-Z0-9]+ match a string of any length as long as it’s alphanumeric
^[a-zA-Z]+[0-9]*$ Ensure start of string is only letters ^[a-z]{3,9}$ match any lowercase for a string between 3 and 9 characters long
^[a-zA-Z]{3}.{3}$ string of any 3 letters followed by any 3 charachters
() groups matching specific parts
\ escape character that allows including regex operators in strings
? preceeding token is optional
^(www\.)?server\.tld$ match www.server.tld and server.tld but not .server.tld

Uniqueness of Free-form Text

Validation of free-form text is more complex and not straightforward. Can still use same questions:

How is this data going to be processed by the app?
In what context will this be used? Blog posts very different from comments or descriptions.
Is free text really necessary or is there another way?

Since syntax and semantic checks are next to impossible for free-form text, best bet at control is whitelisting.

To be continued … Now (2022-12-18)

OWASP has Input Validation Cheat Sheet that lists general techniques to perform whitelisting, summarized as:

Ensure no invalid characters are present through proper encoding
Whitelist expected characters and character sets

Code from example gathering personal details, no input is restricted risking garbage responses. Birthday and age are open to anything, but could be restricted to numbers and alphamuneric patterns. No restrictions means attackers are free to tinker with inputs. Fixed example has same info, but simple restrictions on input by adding type attribute to input element. These changes take advantage of the syntax check done in the HTML layer. Can further add security by imposing min/max limits for name/date entries, and pattern matching for email/website entries.

These changes add semantic checking and whitelisting to the existing syntax checks. Name entries could further be filtered by blacklisting special charaters and numbers.

Client-side vs Server-side

Input validation mostly prevents garbage from coming in, and proper client-side input validation limits misuse and minimizes the attack surface, but doesn’t actively cover malicious use cases.

Server-side protection can come from output escaping and parameterized queries, adding computation load to the server, but helping security as a whole. Exercise of inherent distrust of the user-provided input. Two options that can be used: htmlentities() and HTMLPurifier library.
$name = htmlentities($_GET['name], ENT_QUOTES | ENT_HTML5, "UTF-8") - uses builtin PHP function to escape any character that can affect the application.

<?php require_once '/path/to/HTMLPurifier.auto.php'; $config = HTMLPurifier_Config::createDefault(); $purenow = new HTMLPurifier($config); $output = $purenow->purify($input) ?>

uses HTMLPurifier library to help escape characters and purify content.

Both are efective ways to prevent user misuse and XSS attacks. Implementing both client and server-side validations cover all things nicely. Input validation is not a one-stop shop for securing applications. Full remediation of XSS vulns includes server-side escaping and purification, and browser-level checks (secure cookies, content-security-policy headers, etc.).

The Task

Practice filtering a list using regex terms.