David Sklar has a post to his site showing you how to fix broken UTF-8 characters in content being passed through the normal string functions.
When working on the i18n bits of Learning PHP 7, I had a problem. My example showing how plain string functions such as strtolower() and strtoupper() mangle multibyte UTF-8 characters was making the book formatting/rendering pipeline barf. The processing tools are expecing nicely formatted, valid, UTF-8 encoded HTMLBook files. It didn’t like the mangled invalid UTF-8 characters in my example output.
To fix this, I wrote the following function to replace invalid UTF-8 sequences with the Unicode Replacement Character (U+FFFD).
He includes the code for this method that walks through the string, character by character, and checks the bytes it contains to see how it needs to be translated. There's plenty of comments in it too, explaining what it's doing as it goes along.