Repost Sleuth Bot is a high performance bot that is able to detect Reddit reposts extremely fast.
It also includes a large number of custom admin abilities to help moderators deal with reposts on their Subreddits
- Images: Fully support
- Links: Fully Supported
- Videos: Not Supported
- Text: Not Supported
Code has been written for videos and text. However, they are far too resource demanding to make public.
- Realtime repost detection for ALL supported content submitted to Reddit
- Ability to monitor any post and notify you if someone reposts it
- Realtime repost detection
- Comment on reposts
- Customize search settings for your Subreddit (Limit by matching %, date, subreddit, author, ect)
- Define custom comment templates
- Automatically remove reposts
- Automatically report reposts with custom report templates
- Automatically lock reposts
- Automatically sticky the bot's comment
- Automatically mark a post as OC
- Automatically lock the bot's comment
- Custom report dashboard and management on www.repostsleuth.com
- Discord notifications (coming soon)
!repost watch - Monitor this post and notify you if we see it posted somewhere else
!repost unwatch - Disable an active repost monitors for this post
Repost Sleuth makes heavy use of Celery with a Redis backend. Celery allows a large number of CPU bound tasks to be run in parallel with a number of benefits
All data is store in a MySQL database and we use SQLAlchemy to interact with the data
The bot is split into roughly 9 Docker containers with various instances.
Hardware wise, the bot runs on a Dell r620 with 2x Xeon 2670v2 CPUs and 256gb of RAM. Storage is an all flash array consisting of 8 Samsung Evo 500gb SSDs in RAID 10.
It currently consumes around 70% of these resources.
If you are interested in seeing a specific feature please open a discussion thread
I'm open to contributions however I'm still working out how to handle it. The bot cannot be easily run locally and.
If you feel you can contribute something, please include tests for any for any code you wish to submit.
Memes are by far the hardest reposts to detect accurately. Many templates can produces the same exact hash even with different text in the meme. Due to this most other reposts bots don't work well on meme subs since they produce tons of false positives.
Repost Sleuth has an extra layer of processing for memes that weeds out most false positives. It does result in some false negatives but it's generally pretty accurate.
Using the report False Positive / False Negative in the bots signature helps me track it.
At the moment only ~3.5 percent of comments the bot leaves are reported as false negatives.
This is called a False Positive. Repost Sleuth is good at avoiding most false positives by erring on the side of being too strict. But they happen. That's life. Don't take it personally. We constantly monitor reports and tune the bot the best we can.
An image may look exactly the same to your eye, but the bot sees each individual pixel. Things like JPEG compression can result in a big change to pixels and as a result, a big change to the hashes the bot uses for comparison. So 2 images that look identical may have hashes that are only 80% similar.
Depending on the specific subreddit, this difference may or may not meet the similarity threshold.
There's nothing I can do about this with the current implementation of the bot. It's not perfect but it works pretty well. If you find that horrific, don't use it.
While the bot correctly identifies memes with the same template and different text most of the time, it's not 100%. Especially with newer meme templates.
The bot continually learns meme templates. The more it sees a template the more accurate it gets. However, as new templates are used, it may trigger a false positive until that template has more circulation.
Tag the user as a comment to an image post. u/repostsleuthbot
The bot is still 'Beta'. I'm continually working on stuff and it might crash from time to time. It will be a couple weeks before it's completely stable.
If you properly crosspost something the bot will ignore it. If you take an image and upload it to a new sub you're getting flagged
We're working on indexing older posts. We are currently back to March 2018. Depending on storage space we may go back another year or 2.
It uses a binary tree search for similar image hashes. This allows it to perform fast, accurate searches without checking each individual image
Yes! We're currently looking for communities to Beta test this feature. Enabled communities will have realtime checking of all new posts with configurable options. Send a PM u/barrycarey
Not yet. However, we will be support all post types in the future. We want to focus on images first and get it right.
Currently the bot is running on 3 machines. A Dell r710 server with 2x Xeon X5670 12 core CPUs w/ 96gb RAM, a Ryzen 2700x w/ 32gb RAM, an i7 3770k w/ 32gb of RAM. All of these systems are running Docker containers to deal with the different pieces of the bot.