Bad, Bad Words

Digital Signage is as it’s best when a viewer can interact with it in some way. But if you let the user’s text content onto your screen – a tweet, a comment, a photo caption, or anything else – some of it is going to be unwanted. From just off-brand to outright obscene, you need to be able to filter what appears. So how do you separate the good words from the bad words?

Your Problem

You have some digital signage running in a restaurant. Users can Tweet to it, or send an SMS, and it appears on the screen (after some method of filtering it). The owner of the restaurant is terrified about giving away absolute control over the message, and they’re concerned over a few possible (even likely) scenarios:

  • Complaints will appear on the screen
  • Someone will use one of the ‘seven dirty words’
  • Spambots will flood the screen
  • Sarcastic content will get through
  • Legitimate content will be caught in the filter, and never display on the screen

Solution 1: Human Filtering

The most obvious solution is to have a human being filter the content. When content comes in, the human being responsible accepts or rejects it. Success rate approaches 100% (minus human error), and it’s very reassuring to the owner.

But it’s also a total pain in the ass, and probably too expensive to be worth it.

Even to streamline the process down to a swipe-left, swipe-right, Tinder-style interface on a phone, it still requires a lot of human hours to manually process content coming in. For a few hours for a single event, maybe you can pull this off, but this is a restaurant, open 16 hours, every day.

Solution 2: Human Filtering, After-the-Fact

If you still want that just about 100% success rate with filtering, but don’t want to be the thought police, you can let everything through and ban stuff after it’s already appeared. You assume that all content is innocent until proven guilty, and let it on your screens.

The downsides are obvious – the damage is done if it’s hit the screen, and it requires whoever is monitoring for bad content has to be just as vigilant (maybe even more-so) for bad content. You may find that your users are universally positive, and you can ignore it for the most part. But it’s an ongoing risk.

Solution 3: Automated Filter

This looks great at first glance: get a chunk of computer code to check for bad content, and ban it if it doesn’t pass. Check for dirty words, maybe some extra words we don’t especially like, and let it do it’s thing. Seems simple enough, right?

The problem here is that the computer code is only going to be as clever as the programmer who wrote it (and without millions of lines of code and a massive database, probably far less clever). And pitting a computer program against the ingenuity of the human mind, I’m putting all of my money on the human.

Here’s a short list of some of the problems:

  • Computer algorithms that do a good job of this today are all private – making a text-checker public knowledge makes it easier for spam-bot makers to find loopholes. So you’re starting from a scratch, or at best a weak implementation.
  • You need a dictionary of bad words. It’s not an enjoyable task researching this, especially when you get into racist terms.
  • You need a dictionary of bad words for your purposes – chances are that in a restaurant, the word ‘burnt’ is a bad word too.
  • There’s also bad words you want to let through in certain context. ‘XXX’ is bad. ‘Superbowl XXX’ is good. ‘Porn’ is bad. ‘Food Porn’ is (hopefully) good.
  • Computers aren’t very good at detecting spam reliably.
  • Computers aren’t very good at detecting sarcasm at all.
  • The Scunthrope Problem. If you decided ‘sex’ is a bad word, hopefully none of your patrons are from Sussex.
  • Misspelled bad words. Don’t want ‘crap’ on your screens? What about ‘krap’, ‘craaaaap’ ,’c-r-a-p’…
  • Innuendo and euphemisms. A computer isn’t going to ‘catch your drift’ or ‘know what you mean’.

And then, there’s the Unicode glyph problem. Let’s say you wanted to ban the word ‘asshole’. I typed ‘asshоle’… and it got through. Why? Press [ctrl-f], and search for the word ‘asshole’ in this post. You should get three matches from this paragraph, but you only get two. Why? Because the ‘o’ I used is not the Arabic alphabet ‘o’, it’s the Russian alphabet ‘о’ – a different glyph.

It gets far worse. You banned the word ‘shit’ – good idea. Here’s some of my more clever ideas (again, you’ll only find two instances of the word ‘shit’ in this post):

So while you’ll catch the obvious f-bombs, a more clever user won’t really have a problem outwitting your computer algorithm.

Solution 4: Metadata Filtering

Metadata is ‘data about data’, and when governments or corporations start talking about ‘big data’, this is what they’re actually after. The content of a single message is irrelevant when you can put it in a much larger pattern, and discover the context.

So in theory, with a big enough data set, you can start finding patterns on which to eliminate specific users or types of text content from the screen. But this starts getting into an area of computing that requires a lot of code and a lot of raw processing to handle it, driving cost up.

However, some metadata is very easy to track and easy to make mostly correct assumptions about: user account data. With most social media, you can get some basic data about the user, including when their account was created, how many connections they’ve made with other accounts, number of posts, their real name, website, bio, and loads of other (mostly optional) fields.

It’s not difficult to create some basic gateways to allow or deny access to your screen based on this data. The simplest implementation is just a threshold a user has to pass – 20 users following, 100 previous posts, account active in the last week, whatever. A more complex implementation might assign each user a score: each post is a point, each follower is worth 10, using the default profile picture is -100, etc.

With social media, this also has a side benefit of letting more influential people have more access. You don’t have to cater to everyone equally – the user with three followers isn’t as valuable to you as the user with three thousand.

The True Solution: All of the Above

My proposed solution is to combine it all. Here’s how we did it for Clickspace TV:

  1. An optional timeout box for new content. New content won’t appear on the screens until a certain amount of time has passed. It gives whoever is administering the content a window of time to check things out. They can also make the window of time larger if needed.
  2. An easy way to filter after the fact. Banning content is very, very easy, and we empower any staff on site to do so. We also monitor it passively at the Clickspace TV offices, so we can know if anything gets through.
  3. A text-checker that’s easy to update. I can’t divulge all of the trade secrets, but our text-checker has been updated regularly for over three years now. It captures the majority of the problems, and just about everything that’s ‘obvious’ – but we still need to monitor for sarcasm, spam, innuendo, euphemisms and malicious users
  4. A thorough metadata check of the user. Before a user appears on our screens, their account is (quietly) vetted. We assign each user a score based on the numbers about the user: post count, follower count, account age, etc. Then we tweak the scores as we check for other factors – swear words in their bio, number of optional fields they’ve filled out, etc. Our clients set the score threshold individually, so a more cautious user can exercise more restraint on their system.
  5. …and no good way for users to test the limits. One of the reasons Clickspace TV’s system has held up is because we don’t divulge if or why a piece of text was banned. For someone to try to find their way through our barriers would take a tremendous amount of trial and error with very little feedback. And with enough assaults on our system, we just ban users outright, giving even less feedback.

Further Resources

  • A thorough bad word dictionary is vital. As a starting point, I recommend this (nsfw) list compiled from a Google resource in 2011, and add / remove from there.
  • The Big List of Naughty Strings on github is a fantastic list that should help with testing out your banned word filters
  • Read up on The Scunthorpe problem, especially the ‘other examples’ for common pitfalls

Header photograph by MattysFlicks, licensed under Creative Commons

Leave a Reply

Your email address will not be published. Required fields are marked *