<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
	<title><![CDATA[ParchmentHill Search - Message Board]]></title>
	<link>http://parchmenthill.websitetoolbox.com</link>
	<description><![CDATA[ParchmentHill Search - Message Board]]></description>
	<ttl>60</ttl>
	<pubDate>Wed, 16 May 2012 21:18:41 GMT</pubDate>
	<item>
		<title><![CDATA[Does not handle no-www Class-B correctly]]></title>
		<link>http://parchmenthill.websitetoolbox.com/post?id=3975855</link>
		<description><![CDATA[For the last hour, I have been getting this:<br><br>76.73.37.234 <br>search.parchmenthill.com <a href="http://www.animu.info" target="_blank">http://www.animu.info</a>   "GET <a href="http://animu.info/" target="_blank">http://animu.info/</a> HTTP/1.1" 301 0 "-" "Mozilla/5.0 (compatible;+ParchBot/1.0;++http://www.parchmenthill.com/search.htm)"<br><br><br>76.73.37.234 <br>search.parchmenthill.com <a href="http://www.animu.info" target="_blank">http://www.animu.info</a>   "GET <a href="http://animu.info/" target="_blank">http://animu.info/</a> HTTP/1.1" 301 0 "-" "Mozilla/5.0 (compatible;+ParchBot/1.0;++http://www.parchmenthill.com/search.htm)"<br><br><br>76.73.37.234 <br>search.parchmenthill.com <a href="http://www.animu.info" target="_blank">http://www.animu.info</a>   "GET <a href="http://animu.info/" target="_blank">http://animu.info/</a> HTTP/1.1" 301 0 "-" "Mozilla/5.0 (compatible;+ParchBot/1.0;++http://www.parchmenthill.com/search.htm)"<br><br><br>76.73.37.234 <br>search.parchmenthill.com <a href="http://www.animu.info" target="_blank">http://www.animu.info</a>   "GET <a href="http://animu.info/" target="_blank">http://animu.info/</a> HTTP/1.1" 301 0 "-" "Mozilla/5.0 (compatible;+ParchBot/1.0;++http://www.parchmenthill.com/search.htm)"<br><br><br>76.73.37.234 <br>search.parchmenthill.com <a href="http://www.animu.info" target="_blank">http://www.animu.info</a>   "GET <a href="http://animu.info/" target="_blank">http://animu.info/</a> HTTP/1.1" 301 0 "-" "Mozilla/5.0 (compatible;+ParchBot/1.0;++http://www.parchmenthill.com/search.htm)"<br><br><br>76.73.37.234 <br>search.parchmenthill.com <a href="http://www.animu.info" target="_blank">http://www.animu.info</a>   "GET <a href="http://animu.info/" target="_blank">http://animu.info/</a> HTTP/1.1" 301 0 "-" "Mozilla/5.0 (compatible;+ParchBot/1.0;++http://www.parchmenthill.com/search.htm)"<br><br><br>76.73.37.234 <br>search.parchmenthill.com <a href="http://www.animu.info" target="_blank">http://www.animu.info</a>   "GET <a href="http://animu.info/" target="_blank">http://animu.info/</a> HTTP/1.1" 301 0 "-" "Mozilla/5.0 (compatible;+ParchBot/1.0;++http://www.parchmenthill.com/search.htm)"<br><br><br>76.73.37.234 <br>search.parchmenthill.com <a href="http://www.animu.info" target="_blank">http://www.animu.info</a>   "GET <a href="http://animu.info/" target="_blank">http://animu.info/</a> HTTP/1.1" 301 0 "-" "Mozilla/5.0 (compatible;+ParchBot/1.0;++http://www.parchmenthill.com/search.htm)"<br><br><br><br><br><br><br><br><br><b>EVERY SECOND, </b>it connects to my domain and tries to get it. I think it's because of a rule I have in my lighttpd config.<br><br>## no-www Class-B complaint<br>$HTTP  =~ "^www\.(.*)$" {<br>&nbsp; url.redirect = ( "^/(.*)" =&gt; "http://%1/$1" )<br>}<br><br>It forces www to be turned off; your bot is not realizing that it sends a 301 HTTP error, and hence it tries to connect again. Infinite loop DoSing my server. Please fix your bot.<br><br> <p>Forum: <a href="http://parchmenthill.websitetoolbox.com/?forum=149512">Behavior Issues</a>
]]></description>
		<guid isPermaLink="false">http://parchmenthill.websitetoolbox.com/post?id=3975855</guid>
		<pubDate>Fri, 04 Dec 2009 16:47:30 GMT</pubDate>
	</item>

	<item>
		<title><![CDATA[item in image search.]]></title>
		<link>http://parchmenthill.websitetoolbox.com/post?id=3705553</link>
		<description><![CDATA[<style></style><div><font face="Arial" size="2">I own <a target="_blank" href="http://www.shapesearch.com"><a href="http://www.shapesearch.com" target="_blank">http://www.shapesearch.com</a></a></font></div> <div>&nbsp;</div> <div><font face="Arial" size="2">Who would I talk with in your organization about  adding true item-in-image searching to your crawling?</font></div> <div>&nbsp;</div> <div><font face="Arial" size="2">Check around, you won't find any other Image search  software like ShapeSearch.&nbsp; It's simple, fast, and best of all it works better  then anything else out there with 1/100 the hardware, 1,000% better results, and  0 to 3% false positives.<br>&nbsp;<br>It's true image item search software.&nbsp;&nbsp;  </font></div> <div>&nbsp;</div> <div><font face="Arial" size="2">I am looking to license the  technology.</font></div> <div>&nbsp;</div> <div><font face="Arial" size="2">415 987 9414</font></div> <div>&nbsp;</div> <div><font face="Arial" size="2">Ken</font></div> <p>Forum: <a href="http://parchmenthill.websitetoolbox.com/?forum=149510">General</a>
]]></description>
		<guid isPermaLink="false">http://parchmenthill.websitetoolbox.com/post?id=3705553</guid>
		<pubDate>Fri, 09 Oct 2009 02:54:24 GMT</pubDate>
	</item>

	<item>
		<title><![CDATA[Hey from EssayRunner]]></title>
		<link>http://parchmenthill.websitetoolbox.com/post?id=3703588</link>
		<description><![CDATA[Hey guys,<br /><br>
<br /><br>
I saw you in my log by chance.  I run a spider that crawls the Gnutella (Limewire) network.  Quite a bit different than what you're doing.  What language is your crawler written in?<br /><br>
<br /><br>
-Adam <p>Forum: <a href="http://parchmenthill.websitetoolbox.com/?forum=149510">General</a>
]]></description>
		<guid isPermaLink="false">http://parchmenthill.websitetoolbox.com/post?id=3703588</guid>
		<pubDate>Thu, 08 Oct 2009 05:49:31 GMT</pubDate>
	</item>

	<item>
		<title><![CDATA[about this message boarding]]></title>
		<link>http://parchmenthill.websitetoolbox.com/post?id=3699687</link>
		<description><![CDATA[Hello (=<br><br>I really loved this Message Board style. Where can I download this PHP script to use it in m own website?<br><br>Thank you,<br><br>Alexandre<br><br> <p>Forum: <a href="http://parchmenthill.websitetoolbox.com/?forum=149510">General</a>
]]></description>
		<guid isPermaLink="false">http://parchmenthill.websitetoolbox.com/post?id=3699687</guid>
		<pubDate>Tue, 06 Oct 2009 12:55:47 GMT</pubDate>
	</item>

	<item>
		<title><![CDATA[What makes a website spammy?]]></title>
		<link>http://parchmenthill.websitetoolbox.com/post?id=3689237</link>
		<description><![CDATA[We have a different idea of what constitutes spam than a standard search engine, since we look primarily at the main home page of each domain.<br><br>In general, we consider a website not to be legitimate if an organization has multiple websites that would best be handled as one website.&nbsp; In other words, where an organization is taking advantage of the low cost of a domain to help improve their search engine rankings.&nbsp; The worst offenders are those that have identical wording on multiple domains as well.<br><br>Why is this so bad?&nbsp; Let's say that there are two domains in the world that mention both 'widgets' and 'Acme'.&nbsp; We typically have an easy time determining which website is run by the Acme Corporation and sells widgets, and which is the blog of a happy customer of theirs that happens to mention Acme widgets.&nbsp; Then, someone comes along and creates 1,000 domains that all have the words 'widgets' and 'Acme' (perhaps as random words, perhaps mentioning Acme widgets for a seemingly valid reason).&nbsp; We now have 1,002 domains to sort through (rather than 2), which takes up a lot more resources.<br><br>Plus, as we are analyzing words on the entire web and comparing it to individual domains, if we aren't careful, we might think that there is something 'wrong' with the real Acme widgets website, since we see that 99% of all websites that mention 'Acme' and 'widgets' also mention 'gizmo' (but the real Acme widgets website does not).&nbsp; So we get a false view of what words belong where, and truly legitimate websites get hurt.<br><br>So to produce decent results, we have to hunt down lots of the duplicated domains.&nbsp; Ideally, we'll have them turn up in search results when appropriate (e.g. someone searching for 'Acme', 'widgets', and 'gizmo', where no other domains match), but also mark them for exclusion when doing certain analyses of the web.<br><br>For those that have registered multiple domains for a single website (e.g. variations of spelling, an abbreviation of the company name, different TLDs, etc.), you really need to redirect to the 'proper' website.&nbsp; In other words, you want browsers (and search engines) to know that you'd prefer for people to use one specific domain name.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -Scott<br><br> <p>Forum: <a href="http://parchmenthill.websitetoolbox.com/?forum=150298">Technical Details</a>
]]></description>
		<guid isPermaLink="false">http://parchmenthill.websitetoolbox.com/post?id=3689237</guid>
		<pubDate>Tue, 29 Sep 2009 16:55:56 GMT</pubDate>
	</item>

	<item>
		<title><![CDATA[Welcome]]></title>
		<link>http://parchmenthill.websitetoolbox.com/post?id=3689211</link>
		<description><![CDATA[Hello,<br><br>This is the forum for discussing any technical details about the search engine.&nbsp; This would be the place for SEO (Search Engine Optimization) questions and the like.&nbsp; To start a new thread, just click the 'New Topic' button.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -Scott<br><br> <p>Forum: <a href="http://parchmenthill.websitetoolbox.com/?forum=150298">Technical Details</a>
]]></description>
		<guid isPermaLink="false">http://parchmenthill.websitetoolbox.com/post?id=3689211</guid>
		<pubDate>Tue, 29 Sep 2009 16:34:41 GMT</pubDate>
	</item>

	<item>
		<title><![CDATA[Hello from Sentientsplace.net]]></title>
		<link>http://parchmenthill.websitetoolbox.com/post?id=3674029</link>
		<description><![CDATA[Howdy! How are you all? I trust I've found your team in high spirits.<br /><br>
<br /><br>
Just letting you know that your bot trawled my site yesterday. I was somewhat surprised to see it there, but that being said, it is most welcome.<br /><br>
<br /><br>
Feel free to hit my domain as many times as you require. <p>Forum: <a href="http://parchmenthill.websitetoolbox.com/?forum=149510">General</a>
]]></description>
		<guid isPermaLink="false">http://parchmenthill.websitetoolbox.com/post?id=3674029</guid>
		<pubDate>Sat, 19 Sep 2009 14:53:50 GMT</pubDate>
	</item>

	<item>
		<title><![CDATA[Icon]]></title>
		<link>http://parchmenthill.websitetoolbox.com/post?id=3671254</link>
		<description><![CDATA[<font face="Arial" size="2">Hello,<br><br>did you have an icon for your </font><font color="#000000" face="verdana" size="2"><font face="Arial" size="2">ParchBot ? Then we can use it in out stats ;-)<br><br>thx,<br>françois.</font><br><br></font> <p>Forum: <a href="http://parchmenthill.websitetoolbox.com/?forum=149510">General</a>
]]></description>
		<guid isPermaLink="false">http://parchmenthill.websitetoolbox.com/post?id=3671254</guid>
		<pubDate>Thu, 17 Sep 2009 18:15:41 GMT</pubDate>
	</item>

	<item>
		<title><![CDATA[Welcome]]></title>
		<link>http://parchmenthill.websitetoolbox.com/post?id=3669175</link>
		<description><![CDATA[Hello,<br><br>This forum is for giving us suggestions for our project.&nbsp; Do you have a neat idea for how we could improve our searching?&nbsp; An idea for how the searching could be used?&nbsp; This is the place!<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -Scott<br><br> <p>Forum: <a href="http://parchmenthill.websitetoolbox.com/?forum=149590">Suggestions</a>
]]></description>
		<guid isPermaLink="false">http://parchmenthill.websitetoolbox.com/post?id=3669175</guid>
		<pubDate>Wed, 16 Sep 2009 13:55:21 GMT</pubDate>
	</item>

	<item>
		<title><![CDATA[About our search engine]]></title>
		<link>http://parchmenthill.websitetoolbox.com/post?id=3667775</link>
		<description><![CDATA[Again, welcome.<br><br>Our search engine is designed to get around a problem with most search engines -- it can sometimes be very difficult to find a website (as opposed to a webpage).&nbsp; If you're looking for the website of a large company, it's usually easy to find.&nbsp; But for smaller organizations (especially those with few incoming links, like restaurants), and those with common names, it can be next to impossible.<br><br>This involves using some similar techniques that are used by other search engines (e.g. having an index of words that appear in webpages).&nbsp; One of the main ways that search engines rank websites is by their popularity -- the number of other websites linking to them.&nbsp; But we can't do that, since a small website (that you are trying to find!) may have very few, if any, links pointing to it.<br><br>If you're looking for <i>information</i> (such as the year something occurred, or how to fix something), our search engine isn't likely to be very helpful.&nbsp; But if you are looking for (or have) a <i>website</i>, <i>domain</i>, <i>business</i>, or <i>organization</i>, our search engine should come in handy.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -Scott<br><br> <p>Forum: <a href="http://parchmenthill.websitetoolbox.com/?forum=149510">General</a>
]]></description>
		<guid isPermaLink="false">http://parchmenthill.websitetoolbox.com/post?id=3667775</guid>
		<pubDate>Tue, 15 Sep 2009 17:15:16 GMT</pubDate>
	</item>

	<item>
		<title><![CDATA[Welcome]]></title>
		<link>http://parchmenthill.websitetoolbox.com/post?id=3667747</link>
		<description><![CDATA[Hello,<br><br>This forum is used for discussions regarding how our search engine behaves.<br><br>We value your resources, so we want to make sure that we aren't causing any problems with your site (e.g. slowing things down for other visitors, costing you money, etc.).&nbsp; Due to the methodology of our search engine, it is not likely to access many pages at your site, so it is unlikely to cause any problems.&nbsp; But if it does, please do let us know.<br><br>Just click on 'New Topic' to start a new thread.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -Scott<br><br> <p>Forum: <a href="http://parchmenthill.websitetoolbox.com/?forum=149512">Behavior Issues</a>
]]></description>
		<guid isPermaLink="false">http://parchmenthill.websitetoolbox.com/post?id=3667747</guid>
		<pubDate>Tue, 15 Sep 2009 17:01:51 GMT</pubDate>
	</item>

	<item>
		<title><![CDATA[Welcome]]></title>
		<link>http://parchmenthill.websitetoolbox.com/post?id=3667743</link>
		<description><![CDATA[Hello,<br><br>This is the forum for discussing the robots.txt file and how we interact with it.&nbsp; Robots.txt is a de-facto standard for letting search engines (and other bots, crawlers, etc.) know whether or not they are welcome at your site, and if so, what pages are off-limits.<br><br>Before downloading pages from a website, we check for a robots.txt file in the root directory of your domain (e.g. 'http://www.example.com/robots.txt').&nbsp; If the file is not there, we assume the search engine is welcome.&nbsp; If the file is there, we check to see what it says about what pages we may and may not download from your site.<br><br>Just click on 'New Topic' to start a new thread.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -Scott<br><br><br> <p>Forum: <a href="http://parchmenthill.websitetoolbox.com/?forum=149511">Robots.txt</a>
]]></description>
		<guid isPermaLink="false">http://parchmenthill.websitetoolbox.com/post?id=3667743</guid>
		<pubDate>Tue, 15 Sep 2009 16:58:41 GMT</pubDate>
	</item>

	<item>
		<title><![CDATA[Welcome]]></title>
		<link>http://parchmenthill.websitetoolbox.com/post?id=3667734</link>
		<description><![CDATA[Welcome!<br><br>These message boards are for anyone who wants to discuss the search engine that we are developing.&nbsp; At this point, it is likely only going to be webmasters that are here, as the search engine isn't available for testing as of this writing.&nbsp; But it is crawling!<br><br>Feel free to post any general questions or comments in the 'General' category.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -Scott<br><br> <p>Forum: <a href="http://parchmenthill.websitetoolbox.com/?forum=149510">General</a>
]]></description>
		<guid isPermaLink="false">http://parchmenthill.websitetoolbox.com/post?id=3667734</guid>
		<pubDate>Tue, 15 Sep 2009 16:54:51 GMT</pubDate>
	</item>

</channel>
</rss>
