[rabbitmq-discuss] Our fight against scammers

Mon Aug 11 15:24:02 BST 2008

Guillaume

What can I say .. other than Thanks Very Much :-)

I don't think any use case is 'overkill' if it has a business benefit.
 Let us all know how you get on or if you take this further.

Cheers

alexis

On Fri, Aug 8, 2008 at 7:52 PM, Guillaume Theoret <smokinn at gmail.com> wrote:
> First off, I'd like to thank you for this wonderful tool, it's being
> put to good use!
>
> I thought you might be interested in knowing how we're using it at my
> company as part of a system to identify and rid ourselves of scammers.
>
> It used to be that scammers were easy to spot. An account is created
> and immediately mass messages a huge number of others. Suspicious.
> Those accounts were flagged and banned.
>
> Also, before I even started working here they'd taken a sort of
> scorched earth approach to scam prevention. Scammers usually came from
> ips in places like nigeria, south africa, etc so thanks to the wonders
> of geoip location entire countries were banned. So the scammers got
> smarter. Instead they created multiple accounts that waited a bit and
> sent a small number of messages. Harder to spot but not impossible.
> Multiple accounts created from the same ip aren't always scammers
> (could simply be a nat) but they often are so they were flagged for
> review. They also used proxies to create the accounts which finally
> putting them under the radar entirely.
>
> Here's where RabbitMQ comes in. I've just finished a new filtering
> system where, each time a message is sent, the web server queues up
> the sender's message history into RabbitMQ. At the other end I have a
> consumer that takes the message, feeds it into crm114 and gets a
> result. If it looked scammy I write a log entry in the database for a
> moderator to later review. If it turns out to be a false positive it
> puts the message history back into the queue, the consumer gets it
> back and updates the "good" file. Whenever we have a human report or
> other algorithmic matches (I also check message similarity for
> possible template spam) that are confirmed by a moderator it puts the
> message history into the queue and the consumer takes it out and
> updates the "bad" file.
>
> I tried out both the python py-amqplib and RabbitMQ java clients but
> I'm currently using the java client because it's a whole lot faster.
> Normally I would have much preferred to write my producer and consumer
> in python but the speed increase was worth writing a little java code.
> With the java client I'm getting incredibly good throughput, so much
> so that I imagine we'll be able to stay with a single consumer for a
> while. This system isn't live yet so the only throughput numbers I
> have is running everything (the server, producer and consumer) on my
> average-horsepower laptop but even if it were deployed to my laptop it
> would probably be enough for now. (It won't be though we're getting it
> a beefy server of course and I'll have a better idea of the real
> numbers once the machine arrives.) On my laptop I can send ~8800
> messages/sec and I can consume & process ~300 messages/sec.
>
> The way I'm running everything right now is basically like this:
>
> The producer is installed on the web servers and listens on a local
> socket. It blindly forwards everything it gets to a RabbitMQ queue.
> Basically I did it this way just because there aren't any php
> libraries available.
>
> When the php messaging module is hit, it delivers the message as
> usual, but it also forwards the message history to the producer on a
> local socket. (This step is most probably going to be changed to a
> cron job before deployment but for now on the dev platform it's
> running like this)
>
> The consumer has its own box with crm114 installed. (The RabbitMQ
> server will probably be installed on this box as well.) It waits on
> the queue and processes the histories based on crm114 output.
>
> Of course, using RabbitMQ for such a simple scenario is probably
> incredible overkill (though not really when seen from an effort
> perspective since it was actually reasonably easy to learn, set up and
> get going) when I could've easily used something like beanstalkd given
> that guaranteed delivery isn't all that important but this was really
> just a proof-of-concept for a (probably sooner rather than) later
> project of writing our own financial transaction processing that of
> course needs to be much more rigorous.
>
> Also, this many-producer (we have a bunch of web servers) one-listener
> scenario is fine for now but later on we're most probably going to
> need more than one box for message analysis. This is where I run into
> a distribution problem and where crm114 will shine. For now what I
> feed into the queue is an xml message with an action to be performed
> (pick, learnbad, learngood), an optional member id and a message
> history. Why this won't scale to many consumers is that crm114 keeps
> its "good" and "bad" database as a statistics file on disk. Later,
> when I need more consumer boxes I can easily refactor this to one box
> that listens on a learn queue and, whenever it updates a file (these
> are rare since it's human moderators that generate these updates),
> drops the modified file into the queue that all pick consumers are
> listening on and they can all replace the right statistics file by the
> new one.
>
> So all this to say that I am pleased to report that RabbitMQ is what
> I'm going to highly recommend we build the financial processing
> project around. Thanks a lot for the amazing work.
>
> _______________________________________________
> rabbitmq-discuss mailing list
> rabbitmq-discuss at lists.rabbitmq.com
> http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss
>