View Full Version : Bad Words Filter
blakefrance
16-Jul-2007, 12:31 AM
Hey guys,
I'm using formmail and love it. But I'm wondering if there is any support for a bad words filter. I have my own filter made but dont know how to tie the two together, or is there one built in?
Thanks
Blake France
russellr
22-Jul-2007, 08:55 PM
Hi,
Yes, this feature is planned for a future version.
CHMOD000
21-Jun-2008, 10:42 PM
Recently I've had a bot automatically posting links within what seems to be latin text. Links are generated within the latin text, but the latin text is always the same. If a bad words filter was part of tectite, I could specify that one of these latin words was a "bad word".
In my case, I wouldn't want the bad words changed to other chars, I would want the user to be sent to my bad_url page, or perhaps a clean_up_your_filth page.
Russel, do you have any plans to make this happen? I could probably make something up that everyone could use and send it to you, but I'm sure you are a master of php, and know your program much better than I do, making it easier for you to just do it yourself.
My time is somewhat limited now, having just got married, and a new kid to take care of full time. I will try to work on a hook, and see what you think...
It won't be that hard to check the incoming post vars and redirect to bad_url.... so tell me if you have made any progress with this, and if I am wasting my time.
Thanks,
Brian
CHMOD000
22-Jun-2008, 12:06 AM
Ok, so I made up this little hook. I'm not sure if it would be a major hit to performance, but perhaps Russel could suggest something better???
fmhookpreinit.inc
<?php
$badWord = array(
'smelly'
);
foreach ($_POST as $dirtyString){
foreach ($badWord as $unwanted){
$testedString = strpos($dirtyString,$unwanted);
if ($testedString){
die();
}
}
}
?>
obviously the badWord array could be added to, and even more obvious is the word "smelly" not really being a bad word, but I wanted to keep things clean in here.
russellr
22-Jun-2008, 09:28 PM
Hi,
Thanks for the suggestion. You'll have to be careful about the words you enter because it will match anywhere in the string.
For example, if you want check for "sex", then you would catch "essex", and "sussex" and "sextant", etc.
These sorts of checks are difficult to do very well, but we'll look into to providing it as a standard part of FormMail.
We haven't done it in the past because of the weakness of the process. Spammers now tend to replace letters with numbers that look like the letters, e.g. "v1agra".
So, it becomes an endless battle against the spammers (which you have to perform manually) and big danger in silently and unhelpfully blocking real submissions from real users.
CHMOD000
23-Jun-2008, 03:20 AM
Yes Russell, I already knew that my hook would search the post vars as you have stated. In my case, there was certain long Latin word appearing in each email, so that is what I used to kill that particular spam from being sent to me. As you have said, getting rid of spam altogether is tricky. I have been trying to think of creative ways to defeat spam, but so far I have not come up with anything all that great. I will send you code if I come up with something I think you might be interested in.
CHMOD000
19-Jul-2008, 05:26 AM
It has been a while since I've been using my bad words filter, and it is working great. I wanted to post again and tell you that if you have any problems with Russian spammers, you might want to try adding Russian characters to the bad words filter. I have added some, and it stops the Russian spam. Here is an example:
<?php
$badWord = array(
'ж',
'д',
'и',
'ю',
'Ð'
);
foreach ($_POST as $dirtyString){
foreach ($badWord as $unwanted){
$testedString = strpos($dirtyString,$unwanted);
if ($testedString){
die();
}
}
}
?>
Of course, these aren't bad words, but php will search the input for these russian characters, and in my case die()!
russellr
21-Jul-2008, 03:25 AM
Hi,
Thanks for the info.
I'm 100% against spam filters on email for one very good reason: they don't work.
By "they don't work" I mean they block legitimate email.
So, we use a challenge-response system, which guarantees a valid human user will always be able to send us email. A little inconvenient, but nothing is perfect. And it works - blocks 99.99% of spam, and doesn't block legitimate email.
In this context, I've always tried to use the same philosophy with FormMail's anti-spam features.
We sometimes get complaints from people whose submissions get blocked on a site because the site owner has used the ATTACK_DETECTION_MANY_URLS feature and set it very low.
But, that's a choice for the site owner. The default FormMail doesn't stop submissions with URLs.
What worries me about your solution is this: how do you know it's not blocking legitimate submissions?
I hope you see my concern about how to implement this sort of spam protection reliably - meaning, it doesn't block legitimate submissions.
CHMOD000
21-Jul-2008, 03:59 AM
I suppose if a site were somehow expecting the occasional email with Russian characters in it, then my solution would be the wrong thing to implement on that particular site.
If you or others are interested, the following script has proven semi-beneficial in reducing my spam:
<?php
/*
$Id: VerifyEmailAddress.php 8 2008-01-13 22:51:10Z visser $
Email address verification with SMTP probes
Dick Visser <dick@tienhuis.nl>
INTRODUCTION
This function tries to verify an email address using several tehniques,
depending on the configuration.
Arguments that are needed:
$email (string)
The address you are trying to verify
$domainCheck (boolean)
Check if any DNS MX records exist for domain part
$verify (boolean)
Use SMTP verify probes to see if the address is deliverable.
$probe_address (string)
This is the email address that is used as FROM address in outgoing
probes. Make sure this address exists so that in the event that the
other side does probing too this will work.
$helo_address (string)
This should be the hostname of the machine that runs this site.
$return_errors (boolean)
By default, no errors are returned. This means that the function will evaluate
to TRUE if no errors are found, and false in case of errors. It is not possible
to return those errors, because returning something would be a TRUE.
When $return_errors is set, the function will return FALSE if the address
passes the tests. If it does not validate, an array with errors is returned.
A global variable $debug can be set to display all the steps.
EXAMPLES
Use more options to get better checking.
Check only by syntax: validateEmail('dick@tienhuis.nl')
Check syntax + DNS MX records: validateEmail('dick@tienhuis.nl', true);
Check syntax + DNS records + SMTP probe:
validateEmail('dick@tienhuis.nl', true, true, 'postmaster@tienhuis.nl', 'outkast.tienhuis.nl');
WARNING
This function works for now, but it may well break in the future.
*/
function validateEmail($email, $domainCheck = false, $verify = false, $probe_address='', $helo_address='', $return_errors=false) {
global $debug;
$server_timeout = 180; # timeout in seconds. Some servers deliberately wait a while (tarpitting)
if($debug) {echo "<pre>";}
# Check email syntax with regex
if (preg_match('/^([a-zA-Z0-9\._\+-]+)\@((\[?)[a-zA-Z0-9\-\.]+\.([a-zA-Z]{2,7}|[0-9]{1,3})(\]?))$/', $email, $matches)) {
$user = $matches[1];
$domain = $matches[2];
# Check availability of DNS MX records
if ($domainCheck && function_exists('checkdnsrr')) {
# Construct array of available mailservers
if(getmxrr($domain, $mxhosts, $mxweight)) {
for($i=0;$i<count($mxhosts);$i++){
$mxs[$mxhosts[$i]] = $mxweight[$i];
}
asort($mxs);
$mailers = array_keys($mxs);
} elseif(checkdnsrr($domain, 'A')) {
$mailers[0] = gethostbyname($domain);
} else {
$mailers=array();
}
$total = count($mailers);
# Query each mailserver
if($total > 0 && $verify) {
# Check if mailers accept mail
for($n=0; $n < $total; $n++) {
# Check if socket can be opened
if($debug) { echo "Checking server $mailers[$n]...\n";}
$connect_timeout = $server_timeout;
$errno = 0;
$errstr = 0;
# Try to open up socket
if($sock = @fsockopen($mailers[$n], 25, $errno , $errstr, $connect_timeout)) {
$response = fgets($sock);
if($debug) {echo "Opening up socket to $mailers[$n]... Succes!\n";}
stream_set_timeout($sock, 30);
$meta = stream_get_meta_data($sock);
if($debug) { echo "$mailers[$n] replied: $response\n";}
$cmds = array(
"HELO $helo_address",
"MAIL FROM: <$probe_address>",
"RCPT TO: <$email>",
"QUIT",
);
# Hard error on connect -> break out
# Error means 'any reply that does not start with 2xx '
if(!$meta['timed_out'] && !preg_match('/^2\d\d[ -]/', $response)) {
$error = "Error: $mailers[$n] said: $response\n";
break;
}
foreach($cmds as $cmd) {
$before = microtime(true);
fputs($sock, "$cmd\r\n");
$response = fgets($sock, 4096);
$t = 1000*(microtime(true)-$before);
if($debug) {echo htmlentities("$cmd\n$response") . "(" . sprintf('%.2f', $t) . " ms)\n";}
if(!$meta['timed_out'] && preg_match('/^5\d\d[ -]/', $response)) {
$error = "Unverified address: $mailers[$n] said: $response";
break 2;
}
}
fclose($sock);
if($debug) { echo "Succesful communication with $mailers[$n], no hard errors, assuming OK";}
break;
} elseif($n == $total-1) {
$error = "None of the mailservers listed for $domain could be contacted";
}
}
} elseif($total <= 0) {
$error = "No usable DNS records found for domain '$domain'";
}
}
} else {
$error = 'Address syntax not correct';
}
if($debug) { echo "</pre>";}
if($return_errors) {
# Give back details about the error(s).
# Return FALSE if there are no errors.
if(isset($error)) return htmlentities($error); else return false;
} else {
# 'Old' behaviour, simple to understand
if(isset($error)) return false; else return true;
}
}
?>
If there is something better to use, I would be willing to test it. Also, since I have implemented the Russian character filter, I have not received a single spam.
nadworks
26-Mar-2010, 09:23 AM
Unfortunately I feel that a word blocking option has become a serious requirement now. I'm already using all other available Spam protection features in FormMail, incl. reverse capture and limiting the number of URLs used.
But the amount of "increase your page ranking / traffic etc." message me and my clients have been receiving in the last 6 months is bordering the ridiculous.
The words to be blocked should not be centralised at your end, but rather manually edited by the website owner. It would be great if this would be part of an upcoming release, because I am determined to avoid having to add a proper captcha field.
I've already started shopping around for alternative form scripts...
Thanks.
russellr
27-Mar-2010, 07:36 AM
Hi,
I'm already using all other available Spam protection features in FormMail, incl. reverse capture
Really? So, Reverse Captcha is not catching these bots, or do you think a human is entering the spam manually?
If you think it's a bot getting around your Reverse Captcha, I'd really like to see how you've set it up.
I might be able to make a suggestion about how to improve it.
I know this is an "arms race", but bots haven't beaten Captcha yet and I'd be very surprised if they're getting around Reverse Captcha in any generic way.
nadworks
27-Mar-2010, 09:37 AM
Thanks Russell. I'm petty sure this is not a bot. Reverse captcha works fine.
Nevertheless this has been happening on almost all domains/websites I am running for clients - messages promoting an SEO service, offering to get you "on the first page of Google" etc.
Feel free to check out the form:
http://www.nadworks.com/contact.htm
Thanks!!
russellr
27-Mar-2010, 12:14 PM
Hi,
Yes, your Reverse Captcha seems to be OK.
So, I agree that it's unlikely to be a bot that's hitting the site(s).
I wonder if we can prove it?
Does the spam fill in all the fields including "firma" and "tel"?
I wonder if the fact that you have your Reverse Captcha fields set in Exclude in mail_options is a factor.
A well written bot would see that and ignore those fields.
Also, a field named "the-message" would likely be filled in automatically, but "firma" is probably not going to be.
I'm just looking for evidence that completely rules out a bot.
In any case, you can use CHMOD000's code to implement a "bad words" filter.
nadworks
29-Mar-2010, 05:56 AM
Thanks, Russell.
The only mandatory fields are "realname" and "email" (nor even the message field).
I'll try re-naming the message field as you suggested.
nadworks
21-May-2010, 12:34 PM
Just a quick update:
We have found the culprit. It's a company called "National Positions" (nationalpositions.com). A quick web search shows a massive amount of spam complaints and discussions comments about known form and forum spam coming from this company/person/domain.
http://spedr.com/16hl5
russellr
21-May-2010, 08:01 PM
Hi,
OK, so you think they hire people to manually enter spam?
We get posts on these forums that very much look like someone manually entering them.
I generally have to delete several per day.
As you know, it's not trivial to register for our forums (Captcha has to be solved and an email received and actioned).
I also receive a very small amount of spam each week despite our Challenge-Response system. There's almost no way this could be automated - a human is involved in the process somewhere.
I'm open minded on solutions to these problems.
We're definitely going to implement a email address checker (thanks for the working code CHMOD000).
And, if we can figure out a way to make a "bad words" filter safe to use, we'll do that too.
nadworks
21-May-2010, 08:08 PM
I didn't think (or even suggest) anything. I was simply filling in a few gaps from my earlier post and letting you know where my problem originates from. That's all.
russellr
21-May-2010, 08:23 PM
Hi,
That's OK, I was just responding to your additional information with more.
It's a journey we're all on. :)
Dingbaticus
22-May-2010, 12:37 AM
Russell,
don't want to seem pushy, but has there been any development on this issue since it was last posted ? :confused:
It would be a great add-on!
russellr
22-May-2010, 08:05 PM
Hi,
Last post was 4 hours before yours! :eek:
No, we haven't begun designing the code for this feature.
The problem is not getting to block submissions - the problem is making sure it doesn't block legitimate submissions.
We always assume web site owners care about their customers and prospects as much as we do. :)
Rella
07-May-2011, 05:06 AM
Just wanted to give this one a bump to keep it alive.
I get very little spam through the contact form, but what there is is almost all under the topic of search rankings/traffic/SEO. They only come through one at a time, at fairly long intervals, so I think this is most probably filled in by humans.
It would be very nice to have an option that points to a list of trigger words that we could customize for ourselves. eg. maybe a line in the ini file?
In my case, if I could just block messages containing words like search, rankings, engine, and traffic, I think it would pretty much remove almost all the spam that gets through now.
Thanks as always for a fantastic script :)
Powered by vBulletin® Version 4.1.4 Copyright © 2012 vBulletin Solutions, Inc. All rights reserved.