Interview with Chris Priebe of Two Hat on AI and abusive content on social networks

Crime / Law | Interviews   |   
Published February 4, 2019   |   

I interviewed Chris Priebe, CEO of Two Hat, which recently released an artificial intelligence model that moderates user-generated reports in real time on social media to prevent abuse, hate speech, and other NSFW content.

Chris Priebe has over 20 years’ experience with fostering healthy online communities and is deeply passionate about making the internet a safer place. Chris was the lead developer on the safety and security elements for Club Penguin, which was acquired by Disney and grew to over 300 million users.

Chris Priebe Two Hat

Chris Priebe, CEO of Two Hat

Chris founded Two Hat in 2012 and began coding Community Sift, a content moderation solution for social platforms that detects and filters high-risk content like bullying, hate speech, and grooming. Today, some of the biggest social platforms in the world use Community Sift to protect users from abusive and unwanted chat, images, and videos.

There is a growing concern in social media and gaming platforms where users encounter with a lot of abusive content like harassment, racism, hate speech etc. In this response to this, Two Hat recently released an artificial intelligence model that moderates user-generated reports in real time. Can you tell us how it works?

It all starts with the user. The responsibility is shared between the entire community – the users, the platform and the technology behind it. If we made technology that was “perfect” at protecting it would over-filter and whitewash the entire experience. Instead, at my company Two Hat, we create chat filters and content moderation software to find the really dangerous content. Like antivirus technology, we want it to only fire when there is a clear signature of harm that is situationally appropriate for the context. For instance, a dating site will have different standards than a child-directed site.

In order prevent a Black Mirror “Arkangel” experience we remove only the worst content. The community then keeps some of the rough edges of real life and interaction which is the nurturing ground of learning resilience and the expansion of the mind to new points of view. But this brings with it a risk – a small percentage of people will manipulate the grey areas left for freedom and use them for harm.

It is here where users own part of the responsibility. In today’s economy, the best sites really do want you to report. They really do have humans on the other side who look at it and take action.

But that raises the question of how can you run a social network with a billion users and hire enough humans to deal with all the reports? To compound that, we have seen cases where 80% of the reports are junk. By junk, we mean people pressed the button to see what would happen or to bully the other person.

With a good layout and tool (like Two Hat’s content moderation solution Community Sift), one moderator can review about 500 reports per hour. But all those reports come in all mixed up together, regardless of risk level or seriousness. A human must read each report and review the context, along with the user’s past behavior. After reviewing all the facts, the moderator then labels the conversation (bullying, sextortion, grooming, suicide, etc.) and chooses an action. This may include warning the user, suspending them for 24 hours, banning them, sending them information on suicide hotlines – and in some criminal cases like child pornography (CSAM) sending it to the police.

Having worked with these amazing people for two decades the greatest stories they tell are how they stopped a real bomb or rescued a child. These stories keep them going.

Then, the decision needs to be logged in case a complaint is filed or another report is filed against the user. Over time thousands of decisions are stored along with the factors that helped the moderator make a decision. Was it a one-off or consistent behavior? How many times were they reported and actioned on in the past? What are the linguistic patterns? You can feed all that into an AI system and it can begin to learn.

This is where Predictive Moderation comes in and automatically moderates user reports. We train an AI to look at the same kind of data the human looked at and make a prediction of what a human would do. Notice, however, that we said prediction not decision. That is a critical distinction. For many cases, the situation needs a human touch.

It needs empathy and a grasp of hidden meanings and the abstract. This is what we call HI, or Human Intelligence. Critically, HI is what will preserve our place as humans to always have jobs when AI takes over the repetitive concrete tasks.

Why do you see this as a scalable solution?

If a social network has 100 million users and each user reports one piece of content a day that would create an overwhelming burden for humans to review. Considering that most sites are free or paid for by advertising, the economics to pay that many staff is impossible. If the bulk of the obvious, easy work can be automated, fewer moderators will be needed, and they can focus on doing the work that really matters – keeping people safe.

How does your tool respond to time-sensitive content such as calls to real-life violence and suicide threats?

The leading sites ask you why you are reporting an item before they submit it. They will have sections for self-harm and threats, and these will get routed to a prioritized queue for humans to review first. Our next step with Predictive Moderation is to train the AI not just on “will the moderator take action or not,” but to make predictions on the reported reason. By doing this, items that are flagged as high probability of containing suicide threats can trigger an alarm in real time and immediate response by professionals.

To help with this, we are working on an industry-wide standard for reported reasons so that as each of us trains models we can share them and blend the results to get better and better results. This is currently a project with our clients, but we are hoping to expand it to some of the larger social networks who do not yet work with us.

You talk about training a custom AI model for social media platforms. What does it mean and how does it work?

After a platform has had their humans review 50,000 reported conversations the AI model starts to get pretty accurate and can remove about 50% or more of the workload from humans. So, we can take the decisions made and the data the moderators looked at to make their decision and put it through a neural network. In simple terms, a neural network is like a giant switchboard that is modeled after the human brain.

After a lot of expensive computer time, it learns and begins to predict the right thing. It’s like teaching someone to identify counterfeit money – give them enough examples of real money and enough examples of fake money and eventually they will learn to spot the difference. It’s the same with computers – if you provide enough examples of the conditions that are true when a user is actioned and the conditions when no action is taken it can predict the ones that are normally true.

Social media giants like Twitter, Facebook and Youtube are already keen on expanding their efforts to protect their users, including anti-abuse filters, safe search, banning individuals with abusive behaviors and stopping banned users from creating new accounts. What is your take on the current content moderation model used by these websites?

The problem is so big that we need everyone working on it. I know many of these folks personally and I highly admire the work they are doing.

Fake news, or hoax news, is another big problem on the social networks right now. Interestingly, YouTube recently vowed to recommend fewer conspiracy theory videos. What is your take on this?

That is another really hard problem. Unfortunately, we have our hands full with tackling high-risk content and are not able to touch it. I hope to see a lot of new partners we can work with coming up soon in this space.

Finally, can you tell us some of the key trends and policies we are likely to see in coming years?

Last year GDPR (General Data Protection Regulation) came into effect in the EU and provided a really good stage for a discussion about privacy. I expect that other jurisdictions will come out with their own standard. What worries me is that it will be fragmented and there will be different and perhaps conflicting standards in places like California and other states. By the time every country has their own spin on GDPR, it will be nearly impossible to create any product since communities are global and cross all these boundaries of states and countries. Eventually, everything will need to consolidate into a global standard but hopefully, before too many new policies are put in stone.

We are still waiting to see the final impact of all the Senate hearings with Facebook, Google, and others. They had a very high interest in privacy and safety. At the very least a spotlight has been shone on how much work is being done in this space and how we have big challenges.  The public is beginning to expect more to be done for their safety. We will see more and more sites taking a firm stand against pornography as Tumblr did in December. We will see technologies getting merged together like a buffet where large sites will license our tech and others and blend it with their own to get near-perfect results.  Each time they do that they will raise the bar on public expectations. Eventually, it will be normal to be safe online. One day people may even start reading the comments again and dare to go “below the scroll.”