Non-compliant site coordination: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Content deleted Content added
Line 5: Line 5:
* Language school [[User:62.131.248.45|User:62.131.248.45]] answered by deleting the page twice and claiming I am accusing of copyright infringement. That is not the issue. Copyleft violation is the issue. Language school is using the articles for the purposes of unwarranted success in the search engines. The text of the articles does not show on their pages. They do not link back to Wikipeida or mirror sites, cite GDFL or gnu.org. In some cases they offer no instruction for the language they advertise. They claim that they have a right to use the articles as 'hidden text' and that that is in accordance with GDFL and gnu.org [[User:Amerindianarts|User:Amerindianarts]]
* Language school [[User:62.131.248.45|User:62.131.248.45]] answered by deleting the page twice and claiming I am accusing of copyright infringement. That is not the issue. Copyleft violation is the issue. Language school is using the articles for the purposes of unwarranted success in the search engines. The text of the articles does not show on their pages. They do not link back to Wikipeida or mirror sites, cite GDFL or gnu.org. In some cases they offer no instruction for the language they advertise. They claim that they have a right to use the articles as 'hidden text' and that that is in accordance with GDFL and gnu.org [[User:Amerindianarts|User:Amerindianarts]]
** I think I have figured out how they are doing this. If you search google as described above and get a hit one one of these pages from language.school-explorer.com and click the cached link, you will see a much different page than the one which is actually there. It seems that they give a different page whenever google indexes their webpage. I did a little research (thru google, of course) and found that this practice is referred to as "cloaking" in the webmaster community. The webserver can modify a page based on the user agent identification sent in the HTML request. I'm probably explaining this too much so see http://www.webmasterworld.com/forum3/22408.htm and http://www.google.com/webmasters/guidelines.html#quality if you want to know more. It seems that http://www.google.com/search?hl=en&lr=&safe=off&q=wikipedia+site%3Alanguage.school-explorer.com indicates some 17,100 pages have been indexed by google in this way. Since this site seems to be "abusing Google's quality guidelines" by cloaking, I've reported it at http://www.google.com/contact/spamreport.html I also reported it to MSN. (Yahoo does not have these pages in it's index) I hope this will fix the problem in this instance, but I don't know how long that could take.
** I think I have figured out how they are doing this. If you search google as described above and get a hit one one of these pages from language.school-explorer.com and click the cached link, you will see a much different page than the one which is actually there. It seems that they give a different page whenever google indexes their webpage. I did a little research (thru google, of course) and found that this practice is referred to as "cloaking" in the webmaster community. The webserver can modify a page based on the user agent identification sent in the HTML request. I'm probably explaining this too much so see http://www.webmasterworld.com/forum3/22408.htm and http://www.google.com/webmasters/guidelines.html#quality if you want to know more. It seems that http://www.google.com/search?hl=en&lr=&safe=off&q=wikipedia+site%3Alanguage.school-explorer.com indicates some 17,100 pages have been indexed by google in this way. Since this site seems to be "abusing Google's quality guidelines" by cloaking, I've reported it at http://www.google.com/contact/spamreport.html I also reported it to MSN. (Yahoo does not have these pages in it's index) I hope this will fix the problem in this instance, but I don't know how long that could take.
*That sounds reasonable and I think the complaints have succeeded. [[User:64.136.26.235|64.136.26.235]] 02:51, 14 July 2005 (UTC)
*That sounds reasonable and I think the complaints have succeeded. [[User:64.136.26.235|64.136.26.235]] 02:52, 14 July 2005 (UTC)[[User:64.136.26.235|64.136.26.235]] 02:51, 14 July 2005 (UTC)
-----
-----
* http://www.biography.ms/ [http://www.biography.ms/] seems to directly ripoff without a GFDL [[User:Fawcett5|Fawcett5]] 04:25, 25 May 2005 (UTC)
* http://www.biography.ms/ [http://www.biography.ms/] seems to directly ripoff without a GFDL [[User:Fawcett5|Fawcett5]] 04:25, 25 May 2005 (UTC)

Revision as of 02:52, 14 July 2005

This page is intended to coordinate the efforts to deal with non-compliant sites. Also note the Wikipedia:Wikipedia:Mirrors and forks/GFDL Compliance page, which lists mirrors and how they rate.

Action needed

I'm not sure how they are doing it, but http://language.school-explorer.com/ has encoded Wikipedia articles on languages into their keyword searches without the articles showing on the page under "info about the language". If you take a phrase from any of Wikipedia's language articles and search Google, that keyword phrase is displayed verbatim in the description for the resultant page from that website, e.g. search word 'Zuni world view' results in http://language.school-explorer.com/info/Zuni_language with a verbatim phrase from Wiki's article without the article displayed anywhere on the page. I don't know if they are using wikipedia, or a mirror site. This would seem to be a violation of fair use and I think action needs to be taken. They have not answered my inquiries.User:amerindianarts

  • Language school User:62.131.248.45 answered by deleting the page twice and claiming I am accusing of copyright infringement. That is not the issue. Copyleft violation is the issue. Language school is using the articles for the purposes of unwarranted success in the search engines. The text of the articles does not show on their pages. They do not link back to Wikipeida or mirror sites, cite GDFL or gnu.org. In some cases they offer no instruction for the language they advertise. They claim that they have a right to use the articles as 'hidden text' and that that is in accordance with GDFL and gnu.org User:Amerindianarts
    • I think I have figured out how they are doing this. If you search google as described above and get a hit one one of these pages from language.school-explorer.com and click the cached link, you will see a much different page than the one which is actually there. It seems that they give a different page whenever google indexes their webpage. I did a little research (thru google, of course) and found that this practice is referred to as "cloaking" in the webmaster community. The webserver can modify a page based on the user agent identification sent in the HTML request. I'm probably explaining this too much so see http://www.webmasterworld.com/forum3/22408.htm and http://www.google.com/webmasters/guidelines.html#quality if you want to know more. It seems that http://www.google.com/search?hl=en&lr=&safe=off&q=wikipedia+site%3Alanguage.school-explorer.com indicates some 17,100 pages have been indexed by google in this way. Since this site seems to be "abusing Google's quality guidelines" by cloaking, I've reported it at http://www.google.com/contact/spamreport.html I also reported it to MSN. (Yahoo does not have these pages in it's index) I hope this will fix the problem in this instance, but I don't know how long that could take.
  • That sounds reasonable and I think the complaints have succeeded. 64.136.26.235 02:52, 14 July 2005 (UTC)64.136.26.235 02:51, 14 July 2005 (UTC)[reply]


I'm copying this from the 'Mirrors and forks - low compliance' Wikipedia page, since I believe rapid action is needed - Mark Dingemanse 22:32, 31 Oct 2004 (UTC)

  • Objectssearch Encyclopedia - 'the free encyclopedia'
  • Only links back to en.wikipedia.org, does not link back to original article.
  • Mentions GFDL, linking to GNU.org.
  • This site is stealing Wikipedia bandwidth. When looking up an article, the script get.jsp queries the en.wikipedia.org server realtime , strips the result down to the text alone and places it its own page. Evidence: I tested various pages that I knew had changed very recently. They all returned the most recent version.
  • The best example is of course looking up the main page: Main Page!
  • Needs action. Probably standard letter is not enough. Someone who is more fluent and eloquent than I am in English, please help. - Mark Dingemanse (talk) 21:32, 30 Oct 2004 (UTC)
I have send a request to Wikitech-l to block those ***holes Walter 17:34, 2 Dec 2004 (UTC)
Those *§69!%# are blocked. Hail Brion! Walter 08:04, 16 Dec 2004 (UTC)

It seems there are miriads of encyclopedias that fit your description. Look at Lithuanian encyclopedia, then go to language links - it seems that clones of many languages are made and placed under different domains. lt:User:dirgela


  • getgourmetrecipes are stealing our bandwidth by displaying a non-local copy of our text and images on their site. Angela 03:28, 6 Feb 2005 (UTC)
    • Update: I spoke to Jamesday about this on IRC and he claims bandwidth theft is a relatively minor cost, so sites are no longer being blocked for this. Angela 21:28, 9 Feb 2005 (UTC)

  • www.gerla.cc is stealing Wikipedia bandwidth, downloading articles in realtime from the Italian wikipedia. See for example [2] (section "Collegamenti esterni" added 20:26 UTC, Feb 9, 2005 with this diff). Several other pages tested. It also says at the bottom that all content is public domain (!). Please block this site. Alfio 20:59, 9 Feb 2005 (UTC)
    Looks to me more like they call it GNU FDL. Google translation: "All the contained information in this page are usable liberations riproducibi and in any context. It is not necessary to ask no permission in order to capture them and to use them in whichever way, to exception of the reproducing images marks or regstrati symbols, like marked close to they." Below that it has a yellow box saying: "Contenuto disponibile sotto GNU Free Documentation License." with the copyleft logo. (r3m0t not logged in)

encyclopedia.laborlawtalk.com. "Copyright © 2004 LaborLawTalk.com All rights reserved." No mention of the GFDL or Wikipedia on their pages or their legal notices page. w:en:user:119 06:53, 21 Feb 2005 (UTC)

now shows GFDL notice, links to GNU.org/copyleft/fdl , links to wikipedia individual articles.
This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article "Calendar".

--Yonghokim 02:12, 10 May 2005 (UTC)[reply]

See also http://70.84.126.148/wiki/index.php?title=Main_Page If you edit here then laborlawtalk is modified also. Pablo.cl 18:38, 10 Mar 2005 (UTC)

site dead. may 2005

Just noticed http://www.biography.ms/ is not attributing Wikipedia content. - BanyanTree 17:47, 24 Apr 2005 (UTC)

Likewise; they're copying articles without attribution. Also, this page needs to be easier to find; it took me quite a while on google to get here. -Lunkwill

And they actually use a wildcard domain, like http://lajos-kossuth.biography.ms/ and linking w/o any backref. --grin


http://www.villa.spain-property-costa-blanca.com/archives/2004/02/ Contains a verbatim copy of a version of Spanish Inquisition. I certainly didn't expect that. 217.17.112.204 20:14, 17 May 2005 (UTC)[reply]



The open encyclopedia at http://www.baghdadmuseum.org/ includes pages from Wikipedia without any reference made to Wikipedia, and with their own copyright notice at the bottoms of the pages. Angr 17:19, 23 May 2005 (UTC)[reply]

I've just sent them an e-mail. Angr 17:30, 23 May 2005 (UTC)[reply]
They've responded and added a blurb at the bottom of each page; see http://www.baghdadmuseum.org/ref/ for an example. It looks OK to me, but could someone who knows more about it than me double-check and make sure that's sufficient? Angr 08:34, 24 May 2005 (UTC)[reply]

  • http://wikix.ipupdater.com/ has an outdated copy of our content, and no mention at all of Wikipedia or GFDL, let alone backlinks to our articles. Sent a standard letter to the listed contact address, but it bounced - the address is probably fake. --Fibonacci 01:30, 3 Jun 2005 (UTC)

In progress

A warning was sent to the non-compliant site with the warning that access (referrer, ip) will be blocked if there's no change.


To block

No reaction, please block these. Include the ip (in case of proxies) or the referrer (in case of deep linking/framing).

homoeopathieklinik.de and psychiatrie-klinik.de

  • not GFDL compliant, Webmaster knows this. [3]
  • PLS Notice:There are several mirror-Websites of the one mentioned above, they are listed on the German WP above.--Nerd 15:24, 5 Nov 2004 (UTC)

www.freeglossary.com

[4] (for example)

  • Uses old copies of Wikipedia
  • no back link to Wikipedia indirect back link to wikipedia ("about this page") but link points to a 404.
  • append their own ads to bottom of page
  • assert the right to modify terms and conditions of use
  • No GFDL acknowledgement
  • However the About Us link contains the following: freeglossary.com is powered by PHP, mySQL and Wikipedia each linked to *local articles, and the Wikipedia article links to http://www.wikipedia.org
  • Violation letter sent October 21 2004 Sjc 09:01, 21 Oct 2004 (UTC)
  • No change, still in violation. Sjc 19:55, 25 Nov 2004 (UTC)
  • No change so far. But this doesn't seem to have been reported so far - at the bottom of each article there is a "About this article" link which in turn directs you to Wikipedia.org links for edit this article talk page etc. I think the owner of that site has a serious business mentality issue, like everything needs to be very formal etc, but then had to deal with GFDL. My guess. --Yonghokim 02:07, 10 May 2005 (UTC)[reply]

See also: en:Wikipedia:Mirrors and forks, de:Wikipedia:Projekte, die Wikipedia als Quelle benutzen