notesum.ai
Published at November 26Advancing Content Moderation: Evaluating Large Language Models for Detecting Sensitive Content Across Text, Images, and Videos
cs.CV
cs.AI
Released Date: November 26, 2024
Authors: Nouar AlDahoul, Myles Joshua Toledo Tan, Harishwar Reddy Kasireddy, Yasir Zaki
![[Uncaptioned image]](https://arxiv.org/html/2411.17123v1/x1.png)
| Accuracy | Precision | Recall | F1 score | FPR | FNR | |
| CNN-LSTM (baseline) [87] | 73.35% | 72.53% | 76.9% | 74.01% | 30.2% | 23.1% |
| MobileNetV2– LSTM (baseline) [88] | 82.88% | 82.13% | 83.59% | 82.85% | 17.82% | X |
| DenseNet-121– LSTM (baseline) [88] | 80% | 79.95% | 79.55% | 79.75% | 19.55% | X |
| ShuffleNet– LSTM (baseline) [88] | 66.13% | 64.24% | 71.21% | 67.54% | 38.86% | X |
| EfficientNet-B0–LSTM (baseline [88] | 86.38% | 86.5% | 86.28% | 86.39% | 13.53% | X |
| GPT-4o | 93.75% | 89.06% | 99.75% | 94.10% | 12.25% | 0.25% |
| GPT-4o-mini | 91.75% | 86.30% | 99.25% | 92.33% | 15.75% | 0.75% |
| Gemini 1.5 Pro | 95.5% | 95.5% | 95.5% | 95.5% | 4.5% | 4.5% |
| Gemini 1.5 Flash | 94.13% | 91.14% | 97.75% | 94.33% | 9.5% | 2.25% |
| Llama-3.2-11B-Vision-Instruct | 82.12% | 82.04% | 82.25% | 82.15% | 18% | 17.75% |