Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to Software Development on Codidact!

Will you help us build our independent community of developers helping developers? We're small and trying to grow. We welcome questions about all aspects of software development, from design to code to QA and more. Got questions? Got answers? Got code you'd like someone to review? Please join us.

Comments on How do I use an existing AI model to classify pornographic images?

Post

How do I use an existing AI model to classify pornographic images? [closed]

+2
−6

Closed as too generic by Alexei‭ on Jul 27, 2023 at 08:14

This post contains multiple questions or has many possible indistinguishable correct answers or requires extraordinary long answers.

This question was closed; new answers can no longer be added. Users with the reopen privilege may vote to reopen this question if it has been improved or closed incorrectly.

In the last few months, AI has advanced considerably, notably in the area of generating images. We now have powerful models like DALL-E, Stable Diffusion, etc. These are quite competent at generating an image based on a text prompt.

Can any existing models be used as a simple binary classifier of porn/not porn for images? How do I do this?

I'm asking for a high level description of how to set up such a classifier. I can read the relevant model/API docs myself, but I would like some pointers and a "big picture" explanation.

  • An okay solution would have 65% precision and recall rate.
  • A good solution should have 95% precision and recall rate.
  • A great solution should have 99% precision and recall rate.

Note that these are rough guidelines, I'm not going to actually benchmark your solution and split hairs over exact accuracy. I'm just trying to explain what sort of ballpark performance I expect. I want to use this to filter out and remove porn from various content streams, so I need something that has a chance of working reasonably well, not just a proof of concept.

The definition of porn is not critical. So long as things that are obviously porn (full nudity) get a positive, and things that are obviously not (cat eating watermelon) get a negative, I'm not too worried about borderline cases (sculptures, suggestive imagery, pareidolia, etc I don't really care which way it's classified).

Ideally I would like to avoid developing full image recognition models of my own, since it's a lot of work for an individual. So no building large training sets, training models, tuning them, etc. I am hoping the existing state of the art models are already trained enough to distinguish porn from not porn.

I am willing to develop a small model if necessary - a sort of minimal layer boosting off of a "real" image model, that merely classifies the output of that model into my binary classes rather than learning to actually interpret images. I'm thinking here something very simple, like a basic decision tree, with 100s or even 10s of training data (generated manually). The real work of interpreting images should still be done by an existing off the shelf model.

History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

2 comment threads

I know it's broad, so I'm not asking for depth (5 comments)
Nudity detection overview (1 comment)
I know it's broad, so I'm not asking for depth
matthewsnyder‭ wrote 10 months ago

My goal here is to get an answer that is maybe 1-5 paragraphs, that outlines the main elements of such a setup, and drops enough specific keywords that I can figure out the rest by searching online. So while the question sounds broad, I'm not asking for a lot of depth.

Edit suggestions welcome!

qwr‭ wrote 10 months ago

Your question is more about statistics than coding, so probably belongs on the Math community. Anyway what you want (keyword image classification) is well-researched and doesn't require any state-of-the-art models. It is quite easy to train such a classifier with many tutorials on convnets out there, so the hardest part may be getting properly labeled training data.

matthewsnyder‭ wrote 10 months ago

That doesn't really make sense to me. The answer doesn't have much to do with statistics, all the statistical work would be done at the training stage which I am trying to avoid. The stats would be embedded in the pre-trained model. This question about wrapping existing software in my own code, so it seems like a poor fit for the Math section.

the hardest part may be getting properly labeled training data

This is why I am asking for an answer about pre-trained models, because I want to avoid doing my own training.

matthewsnyder‭ wrote 10 months ago

Anyways, I guess it's a moot point since clearly the community has decided this is not an interesting topic for the Software section.

qwr‭ wrote 10 months ago

Unless a model does what you want out-of-the-box, usually pretrained models are customized and tweaked to meet the specific task.