HeadlinesBriefing favicon HeadlinesBriefing.com

Marginalia Search Builds NSFW Filter Using Neural Network

Hacker News •
×

Marginalia Search has developed an NSFW filter using a single hidden layer neural network, moving beyond basic domain-based filtering. The project emerged from API consumer requests and faced challenges balancing speed with accuracy on CPU infrastructure. Initial attempts with fasttext from Meta failed due to skewed training data from search-based sampling.

The team pivoted to using LLMs like Ollama and Qwen 3.5 to label 10,000 search results, achieving human-level consistency. This approach overcame the limitations of transformer models while maintaining the speed requirements for search engine integration. The neural network architecture was inspired by Marginalia's existing recipe detector and focused on handpicked features to reduce noise.

After refining the feature set through chi-squared scoring and manual term selection, the neural network achieved 10-15% false positive and negative rates. The final implementation runs on real search data, with the model continuously improving through verified labeling. The filter successfully balances the low base rate of NSFW content with the need for fast, CPU-friendly classification in a production search environment.