Tumblr and flagging, an overview

flight-of-the-felix:

To start this off, I’m pretty sure this post is going to be flagged because I’m using 1 image that I know gets flagged on its own. I do wonder what score this post is going to get though. (Update: this post scores 0.07455623894929886, aka exactly the same as the image below on its own.)

The following image got sourced from @coeurdastronaute http://coeurdastronaute.tumblr.com/post/179394494479/essays-in-existentialism-ice-and-fire-ii

image

Quick recap if you’ve missed my previous technical musings on the content of flagging and explicit content:
- For each post, tumblr keeps check of a number of variables. One of these is the NSFW score for each post. The score is a value between 0 and 1, and predicts how likely it is that the post is NSFW. for example 0.048 means it has 4.8% probability of being NSFW. In case you wonder, yes that’s enough to get flagged.
- Other variables that are stored on the post are checks if a post is NSFW (yes or no), if a post is NSFW based on the score (yes or no; I’m not sure what the exact difference is, as I’ve only seen them both being no or both being yes), the classification of the post, and here it’s getting interesting, because a classification of ‘explicit’ means it gets flagged, and ‘clean’ means it won’t get flagged. Easy as that.

Very early this morning I wrote a small tool that lets you see the score and the other variables for the last 10 posts of any user. Instructions are meagre, but the tool is located here.

The above story by Coeur got the following scores:
post type: text
is NSFW based on score: false
is NSFW: false
classification: explicit
score: 0.07757801562547684

As you can see, the score for the post was 0.077…, otherwise known as 7.7% probability that the post is/might contain NSFW content. I did some more testing. First, I copied all the text of the fic, and put it in a new post and had it tested. The score was 0, and classification “clean”. So the text was not the problem. Next I posted the image as both a text post and a photo post for comparison. Both returned the same result: not NSFW, classification explicit and a score of 0.074…

The next step was do a comparison of image moderation tools. I’ve prior experience with Microsoft’s cognitive API, in fact I’ve used it to create a porn blog blocker that would screen new followers and block them if they were a suspected porn blog (unless I was already following them). Here are the scores for several major image moderation providers:
Microsoft cognitive:
“adult”: {
   "isAdultContent": false,
   "isRacyContent": false,
   "adultScore": 0.023388456553220749,
   "racyScore": 0.037842851132154465
 }
Google Cloud Vision:
“safeSearchAnnotation”: {
   "adult": “UNLIKELY”,
   "spoof": “POSSIBLE”,
   "medical": “UNLIKELY”,
   "violence": “UNLIKELY”,
   "racy": “LIKELY”
 }
Amazon Rekognition failed to find anything in the image that could be suggestive or explicit adult. Since there was nothing found, there weren’t confidence scores given either.

The last one I tested is far less known: it is called open_nsfw and was actually created by Oath/Yahoo itself, which based on the scores above makes it the most probable to be in use at Tumblr. There is a lot of information not known about it, but here’s a description with references to the paper. I downloaded the model that they released, didn’t fine tune it, and ran it over the above image: NSFW score:   0.02013823203742504

So in conclusion, I have absolutely no clue how Tumblr is able to come up with a score of 0.077 for this image, if the highest score found by other content moderation providers is at 3.7%, barely half of the score tumblr gives it. And that’s not even the actual problem, because this just brings me to the next part.

As I mentioned before, each post gets a couple variable flags for content, including “is_nsfw_based_on_score”. The interesting part here is that this flag appears to only be true, or “yes” if you prefer, when the score is above 0.98. So even when the moderation is 97% certain the image is NSFW, it won’t be flagged as NSFW. However, if your post is only having a score of 0.048… the lowest I’ve seen so far, equaling to 4.8% probability of it being NSFW, it will be flagged as explicit. Keep in mind that the Yahoo paper had the following paragraph:

Our general purpose Caffe deep neural network model (Github code) takes an image as input and outputs a probability (i.e a score between 0-1) which can be used to detect and filter NSFW images. Developers can use this score to filter images below a certain suitable threshold based on a ROC curve for specific use-cases, or use this signal to rank images in search results.

I know that it got pretty technical there, but what they’re saying is that the score on itself is just as-is: a score, a probability. Tumblr’s implementation of the flagging of content however appears to have all content with a score higher than 0 as flagged. I’m not sure if any of you have seen the post about deep learning and unexpected results recently (http://psychopathic-bandaid.tumblr.com/post/180751390174/squiddity3-rubitrightintomyeyes), but tumblr managed to one-up this entire post with the new flagging: Tumblr doesn’t set a threshold, they just decide that if their content moderation says that a post could potentially have a more than 0% chance that content is NSFW, it will flag the post, and that’s it.

Machine learning 101: Even if you’ve created a model that is working pretty decent, you’ve to figure out how to use it. As for tumblr, your model might work decently, the way you use it does not. If content is 90% likely to be NSFW, please flag it as such. If it does not even reach the 30% yeah it’s likely not going to be explicit.

Actual example of a post captioned “caught giving daddy a blowjob”, featuring a single image of a topless woman holding a man’s penis in her hand, the man just wearing a tshirt and socks and that’s it. I got this from an old log file, old meaning September this year, so before the flagging got introduced. As a result I only have the score and NSFW values. The score for this post was 0.92, otherwise known as 92% probability that the post has/is NSFW content. The conclusion with the “is_nsfw” and “is_nsfw_based_on_score” flags was that it was not NSFW. I’ve seen other posts with a score of 0.99 and 1.0 finally getting flagged as NSFW, however, I haven’t seen a single post with a score below 0.98 being marked as NSFW.

Thus concludes this long overview on tumblr’s new flagging system.

@staff Watch and learn