Wikipedia talk:Requests for mediation/Senkaku Islands: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
Bobthefish2 (talk | contribs)
moved
Line 315: Line 315:
* Difficult task to check contents - almost impossible to check site by site to include neutral sites and eliminate biased sites with large results.
* Difficult task to check contents - almost impossible to check site by site to include neutral sites and eliminate biased sites with large results.
[[User:STSC|STSC]] ([[User talk:STSC|talk]]) 01:36, 5 June 2011 (UTC)
[[User:STSC|STSC]] ([[User talk:STSC|talk]]) 01:36, 5 June 2011 (UTC)
:I would guess that this is a non-issue.<p>As stated, this proposed poll topic is unhelpful because settled wiki-[[WP:Consensus|consensus]] rejects any scheme which provides a [http://en.wiktionary.org/wiki/blanket#Adjective wikt:blanket] exclusion of data from [[WP:RS|reliable source]]s, including conventional Internet sources such as [[Google]].<p><b>[[Apples and oranges]] <big> ≠</big> [[Conflation]]</b><p>The development of [[argument]]s/[[counterargument]]s in our mediation context has thus far [[conflate]]d
:I would guess that this is a non-issue.<s>As stated, this proposed poll topic is unhelpful</s> because settled wiki-[[WP:Consensus|consensus]] rejects any scheme which provides a [http://en.wiktionary.org/wiki/blanket#Adjective wikt:blanket] exclusion of data from [[WP:RS|reliable source]]s, including conventional Internet sources such as [[Google]]. --[[User:Tenmei|Tenmei]] ([[User talk:Tenmei|talk]]) 23:26, 5 June 2011 (UTC)<p><s>[[Apples and oranges]] <big> ≠</big> [[Conflation]]. The development of [[argument]]s/[[counterargument]]s in our mediation context has thus far [[conflate]]d (a) disputing what is the English-language common name for our articles about disputed islands in the [[East China Sea]]; and (b) disputing what is a "neutral" name. These are conceptually distinct subjects which must be parsed accordingly.<p>In other words, [http://en.wikipedia.org/w/index.php?title=Wikipedia:Search_engine_test&oldid=429175905#Neutrality "Google (and other search systems) do not aim for a neutral point of view ... <nowiki>[and]</nowiki> Google is specifically not a source of neutral titles"] according to [[Wikipedia:Search engine test#Neutrality]].</s> <small>--[[User:Tenmei|Tenmei]] ([[User talk:Tenmei|talk]]) 20:43, 5 June 2011</small>
:: (a) disputing what is the English-language common name for our articles about disputed islands in the [[East China Sea]]; and
:: (b) disputing what is a "neutral" name.
:These are conceptually distinct subjects which must be parsed accordingly.<p>In other words, [http://en.wikipedia.org/w/index.php?title=Wikipedia:Search_engine_test&oldid=429175905#Neutrality "Google (and other search systems) do not aim for a neutral point of view ... <nowiki>[and]</nowiki> Google is specifically not a source of neutral titles"] according to [[Wikipedia:Search engine test#Neutrality]]. --[[User:Tenmei|Tenmei]] ([[User talk:Tenmei|talk]]) 20:43, 5 June 2011 (UTC)
::'''Moved discussion to Tenmei's talk page; feel free to continue there. <span style="font-family: Palatino Linotype, Book Antiqua, Palatino, serif;" color="#BBAED0">[[User:Feezo|Feezo]] <font size="-2">[[User_talk:Feezo|(send a signal]] | [[Special:Contributions/Feezo|watch the sky]])</font></span> 21:24, 5 June 2011 (UTC)'''
::'''Moved discussion to Tenmei's talk page; feel free to continue there. <span style="font-family: Palatino Linotype, Book Antiqua, Palatino, serif;" color="#BBAED0">[[User:Feezo|Feezo]] <font size="-2">[[User_talk:Feezo|(send a signal]] | [[Special:Contributions/Feezo|watch the sky]])</font></span> 21:24, 5 June 2011 (UTC)'''
:::No. This is contribution to our talk page is not off-topic. --22:41, 5 June 2011 (UTC)
:::No. This single '''bold edit by [[User:Feezo|Feezo]] is insufficiently explained.''' This one example of a mediator's discretionary judgment is inadequately justified. IMO, this contribution is not off-topic. --[[User:Tenmei|Tenmei]] ([[User talk:Tenmei|talk]]) 23:26, 5 June 2011 (UTC)


====Support using encyclopedias and almanacs as a metric====
====Support using encyclopedias and almanacs as a metric====

Revision as of 23:26, 5 June 2011

After consulting the Committee, I am choosing to archive all material not directly related to the primary issue under mediation. As of this moment, the "safe zone" will be considered to encompass this entire page, and off topic discussion will be speedily archived or removed.

Note that this specifically includes discussion of this new format, as well as discussion of which material has been removed. Comments or complaints must go through email or user talk pages.

Finally, let me state that I do not undertake this lightly. This is a considered response that we feel will be beneficial to the continued success of this mediation. If sufficient time passes without a serious violation, these new restrictions will be lifted. Feezo (send a signal | watch the sky) 13:24, 4 June 2011 (UTC)[reply]

A link to archived diffs is needed -- see Wikipedia talk:Requests for mediation/Senkaku Islands/Archive here. --Tenmei (talk) 21:02, 5 June 2011 (UTC)[reply]

Mediation safe zone

Feezo, would it be most beneficial to re-explain all of the detailed arguments and data presented before, along with relevant policies/guidelines, or is it better to simply provide links to discussion archives that contain the information? I'm happy to do either. Qwyrxian (talk) 21:57, 31 May 2011 (UTC)[reply]

I've read the discussions, so I'm familiar with the arguments. However, I think it would be helpful if participants could select the portions of the arguments they most agree with, or feel are most relevant to the discussion, and either summarize or quote them here. There's no need to reproduce all the statistical data, a link or a summary would be fine. Feezo (send a signal | watch the sky) 22:18, 31 May 2011 (UTC)[reply]
Do you want to have subsections with a list of "pro" and "anti" arguments (signed by endorsers, if that's appropriate)? --Bobthefish2 (talk) 01:42, 1 June 2011 (UTC)[reply]

There is also a long-standing issue with reference usage and interpretation that may benefit from this mediation. All sides of the arguments are basically represented in [NPOV noticeboard entry]. --Bobthefish2 (talk) 01:42, 1 June 2011 (UTC)[reply]

Summary statement from Qwyrxian

For me, the starting point should be determining the guidance given to us by the relevant policies and guidelines. I went through a very detailed discussion of this at Talk:Senkaku Islands/Archive 5#What does policy say?, citing the four relevant rules: WP:Article titles paying particular attention to Considering title changes; WP:NPOV, paying particular attention to the discussion of neutrality in article titles which is found in WP:NPOV#Achieving neutrality, and finally Wikipedia:Naming conventions (geographic names), looking specifically at to Wikipedia:Naming conventions (geographic names)#Widely accepted name. To me, these policies, as a whole, explain that neutrality matters, but even WP:NPOV points out that, for article titles, it is not the sole determining factor. Rather, our primary question is always “What is the name used for this group of islands, in general, in high quality, English language sources”, where “high quality” focuses on academic work, really good news work, encyclopedias, almanacs, etc. We can consider other works (i.e., GHITS), but only secondarily. Additionally, the guidelines state that, while it is not strictly forbidden, dual names (i.e., Senkaku/Diaoyu Islands) are a bad idea because they don’t actually solve anything, since the discussion just shifts from “what should the name be” to “which name should come first”. Finally, the guidelines do point out that, in extraordinary cases, it may be necessary to find some other method for determining the names of articles when there isn't a single, clear main "English" name and there are multiple local names; the two examples most frequently cited are Liancourt Rocks (choosing the basically never used name because it was impossible to figure out from the data which other name was more common and the fighting was never-ending), and the Derry/Londonderry compromise (which I don’t understand and am not even sure why it occurred, other than to know its associated with general UK/Ireland problems).

Thus, our job is to try to interpret the data and determine whether or not there is actually a common English name. My interpretation of the data is that, in fact, there is a common English name, and that that name is “Senkaku Islands”, although the treatment given in news sources is such that there may be a shift in the future (maybe 20, 30, 50 years from now) to always use a dual name. This was based on Google News, Scholar, and book searches, along with an attempt at some very close looking at news articles (that is, an article that says “Senkaku/Diaoyu” throughout is clearly choosing the dual name, while one that says “Senkaku Islands (called Diaoyu in China)” and then using Senkaku through the rest of the article is clearly favoring Senkaku, even though on a simple search it shows up in both categories). It was based on the limited online encyclopedias seeming to prefer Senkaku, although I have asked many many times for someone with access to a university library to take a closer look at what other, contemporary print encyclopedias use. But the key deciding factor for me was when I had a short time to spend at a university library in the US, and looked in the almanac section, I found that, of the 5 atlases I found that mentioned the islands at all (3 US, 1 UK, 1 Italian), all 5 used only the name “Senkaku” or “Senkaku-shoto”, and only one of them even listed “Diaoyu” in the index. See Talk:Senkaku Islands/Archive 7#Almanacs for the full details. To me, this seemed like the key piece of evidence that showed that, even though it may change in the future, the current name for the islands in English is Senkaku Islands, and thus that is the name we should use on Wikipedia. I still hold this position today: I believe that it is, in fact "neutral" (based on WP:NPOV itself) to use the name "Senkaku Islands", and that this is the best name to use for English Wikipedia. If this is all still too long, or without enough links, let me know and I'll revise it Qwyrxian (talk) 02:34, 1 June 2011 (UTC)[reply]

One metric which I haven't seen brought up (although it may have been) is measuring the search results for particular top level domains. For instance, site:*.gov "senkaku islands" -diaoyu gives 57 results, while site:*.gov "diaoyu islands" -senkaku gives only 2. For *.edu, the numbers are 175 to 71 in favor of Senkaku. Has this disparity been addressed? Feezo (send a signal | watch the sky) 18:38, 1 June 2011 (UTC)[reply]
I'd recommend you to read my block of test if you haven't already. It directly addresses why your recommendation is not good. For example, the objective of this research project is to assess which of the names are more commonly used in the English language. In your case, your goal is rather to assess the opinions of governments and educational institutes. I can already quite easily shoot down your search results on ".gov" because the far majority of them are U.S. government pages (with an enrichment of Alaskan government pages). Since government documents tend to be coherent in their choice of geographic names, these data points are hence highly correlated with one another and not even remotely close to being independent (which is a core requirement/preference for good sampling practices).
Since you are a computer scientist yourself, I'd assume I don't need to explain to you the basic principles of statistics. In addition to your choice of sampling, I'd also comment that your choice of query is quite suboptimal because there are a number of Chinese and Japanese name variants that are quite commonly used.
Again, I'd encourage you to read through my section and not ignore it. Thanks. --Bobthefish2 (talk) 02:10, 2 June 2011 (UTC).[reply]
I'm following your line of reasoning below, and will respond shortly. I'd like to get the claims made by STSC settled first if that's alright with you. This is just an observation that could be useful in coming up with a more fine grained picture of the distribution the names. Feezo (send a signal | watch the sky) 07:17, 2 June 2011 (UTC)[reply]
No problem. To clarify, information retrieval (IR) is mostly about the mining of documents based on a set of queries. So it's importance in our problem lies in the assessment of search engine results, search parameters, and search heuristics. The content I wrote is mostly focused on statistical linguistics, which has a more generic emphasis. --Bobthefish2 (talk) 07:26, 2 June 2011 (UTC)[reply]
Feezo, it is because the United States defined "Senkaku Islands" as an official term. See Senkaku Islands at Library of Congress Subject Headings. It is also one of the method to determine the widely accepted name in Wikipedia. See no.3 in WP:NCGN#Widely accepted name. ―― Phoenix7777 (talk) 08:16, 2 June 2011 (UTC)[reply]
This is an important enough point that I think it deserves its own subsection; see below.

Other criteria

We've been focusing most of our efforts so far on interpreting search engine results (#2 on NCGN). However, there has been relatively little work done on the other criteria. Although Bobthefish has criticized the use of encyclopedias, they are, after all, part of the NCGN guideline. Different editors here have expressed different preferences for how closely we should follow the guidelines; this may be an "occasional exception". For now, looking at some of the other criteria—

  • For #3, the LCSH uses Senkaku, as does the Library of Congress Country Studies. The Oxford Reference Online doesn't have an entry for any of the four names we're considering. I don't have access to the Cambridge Histories at the moment, although their online search gives five results for Senkaku vs. one for Diaoyu. In any case, we can't completely fill the letter of #3.
  • For #1, I can go to my library and check the suggested encyclopedias. As noted, Bobthefish has objected to using encyclopedias. I think it may have been slightly hasty to rule them out altogether (even if we end up doing this) so I'd like to see some discussion on whether this is justified. Feezo (send a signal | watch the sky) 18:39, 2 June 2011 (UTC)[reply]
For Encyclopedia Britannica, there is no entry for Senkaku and Diaoyu. For The Columbia Encyclopedia, There is an entry for Senkaku Islands, but Diaoyu leads to Senkaku Islands as a redirect. For Encarta, there is no entry for Senkaku and Diaoyu. ―― Phoenix7777 (talk) 20:28, 2 June 2011 (UTC)[reply]

Going back to the other criteria, I'd like input from other editors if they feel that the almanacs results have any relevance. Do these fall under point 1 (are they types of encyclopedias?) under point 3 (are they historic or scientific studies?) or are they not relevant at all? My opinion is that they fall under point 3, but, of course, since the data from those almanacs supports my position, my argument is highly suspect, even to myself. I mean, yes, we all know that almanacs are as much a product of politics as they are of science, but, really, so are encyclopedias, news sources, and everything else. Qwyrxian (talk) 02:34, 3 June 2011 (UTC)[reply]

I don't think that's a very relevant concern. Our main focus is the English language and an important secondary task is to sample the web in a way that is reflective of the language. To respond to your question, I'd ask if there's any reason to think the library of congress or encyclopedias adopt relatively the same frequency of word usage as the entire English language itself? --Bobthefish2 (talk) 02:40, 3 June 2011 (UTC)[reply]
You're asking the wrong question. Our job is to use the tools provided to us in NCGN to determine the most common English term in high quality reliable sources. It's not to determine the most common word across the English language--for a location term, such a task is impossible. As has come up before, we actually don't care which word pops up more often in blogs. "Normal" writing is not the same as the style of language we aim for in Wikipedia. By that logic, we would call the country "America" instead of "The United States of America", because "America" is more frequent in everyday writing. We check encyclopedias, the library of congress, etc., because those sources match the level of writing we're shooting for, and because it's what the guideline tells us to do. As we've discussed before, it's fine to propose changes to guidelines; it's even fine to say the guideline doesn't apply here. But you (by you I mean any one single editor, or even just one single "side") can't by fiat overrule what the guideline tells us to do in this instance without first getting consensus that the guideline doesn't apply in this case. Qwyrxian (talk) 05:48, 3 June 2011 (UTC)[reply]
Since you have fallen so madly in love with the NCGN guidelines despite their lack of quality control, here's a couple of excerpts from WP:NCGN:
  • When a widely accepted English name, in a modern context, exists for a place, we should use it
  • These are advice, intended to guide, not force, consensus; but they are the consensus of actual experience in move discussions.
In reference to the first point, the concept of wide-acceptance is not restricted only to authoritative sources. This is especially true when we consider the problem linguistically. For example, even though "United States of America" is the canonical name of the country and is also extremely widely used, "United States" somehow prevailed as the country's name. Personally, I don't know the statistics. While it's possible that "United States" is more widely used in reliable sources, it still doesn't sound right don't you think?
In reference to the second point, it's simply my little friendly reminder to you that the guideliens you've fallen in love with are simply advice that worked in similar situations and not something we are obliged to follow (sorry to break it to you). Consequently, your self-defined objectives, defined as followed:
  • Our job is to use the tools provided to us in NCGN to determine the most common English term in high quality reliable sources.
  • It's not to determine the most common word across the English language--for a location term, such a task is impossible.
  • As has come up before, we actually don't care which word pops up more often in blogs.
... are really your own definition of our agenda and your personal characterization of the edicts of NCGN.
Now, with all of this said. I would like to clarify that I've never said it is ever possible to find the true usage statistics in the English language (nor is it possible to do that in the space of all the subjectively-defined reliable sources). However, this doesn't mean we can't try to come up with a way to do analyses on some good samplings. --Bobthefish2 (talk) 06:39, 3 June 2011 (UTC)[reply]

Google Book/Scholar hits

I updated my past discussion.

Please see further research below. These results prove the Google Book Search is stable.

  • "Senkaku Islands" -"Diaoyu" 6,500 "Senkaku Islands" without any mention of "Diaoyu"
  • "Senkaku Islands" "Diaoyu" 2,240 "Senkaku Islands" with any mention of "Diaoyu"
6,500 + 2,240 = 8,740 ≈ 8,550
  • "Senkaku Islands" -"Diaoyu/Senkaku Islands" 7,280 without any mention of "Diaoyu/Senkaku Islands"
  • "Senkaku Islands" "Diaoyu/Senkaku Islands" 1,450 with any mention of "Diaoyu/Senkaku Islands"
7,280 + 1,450 =8,790 ≈ 8,550
  • "Senkaku Islands" -"Diaoyu Islands" 7,680 without any mention of "Diaoyu Islands"
  • "Senkaku Islands" "Diaoyu Islands" 930 with any mention of "Diaoyu Islands"
7,680 + 930 = 8,610 ≈ 8,550
  • "Diaoyu Islands" -"Senkaku" 1,350 "Diaoyu Islands" without any mention of "Senkaku"
  • "Diaoyu Islands" "Senkaku" 1,640 "Diaoyu Islands" with any mention of "Senkaku"
1,350 + 1,640 = 2,990 ≈ 3,020
  • "Diaoyu Islands" -"Senkaku/Diaoyu Islands" 2,390 without any mention of "Senkaku/Diaoyu Islands"
  • "Diaoyu Islands" "Senkaku/Diaoyu Islands" 616 with any mention of "Senkaku/Diaoyu Islands"
2,390 + 616 = 3,006 ≈ 3,020
  • "Diaoyu Islands" -"Senkaku Islands" 2,070 without any mention of "Senkaku Islands"
  • "Diaoyu Islands" "Senkaku Islands" 929 with any mention of "Senkaku Islands"
2,070 + 929 =2,999 ≈ 3,020
―― Phoenix7777 (talk) 23:41, 3 June 2011 (UTC)[reply]

Other tests not described in WP:NCGN

―― Phoenix7777 (talk) 22:18, 5 June 2011 (UTC)[reply]
I've spent a little time just looking at the WorldCat results. What I observed is consistent to my little speculation that results can be cooked up to render them appear seemingly impressive but while they truly are not. One of the most obvious issues I noticed is that a very large portion of these hits are written in Japanese. Another element I noticed is that many of these books are outdated. In the interest of carefully examining Phoenix7777's (seemingly) very significant results, I did an advanced search with two criteria:
  • English language only
  • Published at or after 2005
Then lo and behold, we have these following results (verified to have no false positive hits):
  • "Senkaku": 47 hits (44 unique)
  • "Diaoyu": 42 hits (36 unique)
  • "Diaoyutai": 9 hits (8 unique); Total 43 unique non-overlapping Diaoyu/Diaoyutai hits vs. 44 unique Senkaku hits
Now, let's suppose I increase the stringency in time and detect only articles published at or after 2009, then I have even more interesting results:
  • "Senkaku": 19 hits (19 unique)
  • "Diaoyu": 17 hits (16 unique)
  • "Diaoyutai": 5 hits (5 unique); Total 21 unique non-overlapping Diaoyu/Diaoyutai hits vs. 19 unique Senkaku hits
In the future, I'd strongly suggest people reading my little cautionary note about unscientific research practices before engaging in some potentially phony analyses.
I am still busy IRL so I wouldn't do a comprehensive analysis on the rest until later next week. --Bobthefish2 (talk) 23:17, 5 June 2011 (UTC)[reply]

Using Japanese name would imply Wikipedia endorsing the Japanese claim

There are two local names for the islands - the Chinese name Diaoyutai and the Japanese name Senkaku. The PRC, ROC and Japan are the participants in the territorial dispute on the islands. By choosing the Japanese local name as the title would obviously give the impression that Wikipedia has endorsed the Japanese claim for the islands. Most English media nowadays is using the dual name for the islands to maintain their neutrality. My proposed solution is to use the existing English alternative name for the islands - the Pinnacle Islands.
STSC (talk) 04:54, 1 June 2011 (UTC)[reply]

This would be a practical course of action if not for the fact that as previously discussed, Pinnacle Islands is one of the least common names for them, in contravention of WP:POVTITLE. I won't rule it out as possible solution, but this does need to be addressed. Feezo (send a signal | watch the sky) 08:22, 1 June 2011 (UTC)[reply]
Feezo , before you comment the post, please learn a relevant Policies/Guideline. STSC's proposal is explicitly rejected by WP:NCGN#Multiple local names. It only applies to "There are cases in which the local authority recognizes equally two or more names from different languages, but English discussion of the place is so limited that none of the above tests indicate which of them is widely used in English; so there is no single local name, and English usage is hard to determine." In the case of Liancourt Rocks, the Google Book hits are below a hundred at that time (May 2007). See Talk:Liancourt Rocks/Archive 10#Google Book search. Over 10,000 hits in this case are quite reliable to judge which is widely accepted, so the description is not applicable here. ―― Phoenix7777 (talk) 09:36, 1 June 2011 (UTC)[reply]
I am aware of NCGN. However, we are not obligated to use any name given by a local authority unless it is also the most common English name (see Qwyrxian's summary above). Although English discussion in this case is not "limited", the Japanese name enjoys a moderate but not overwhelming lead in terms of general web usage. The more serious issue is the potential for claims of nationalism if either the Japanese or Chinese names are used. It is my impression that the translation would avoid this issue in a way that transliteration does not. It therefore seems reasonable to consider this as an alternative solution even if it contravenes NCGN (which is simply a guideline, while POVTITLE, is policy). Note that I am not specifically endorsing Pinnacle, but am deliberately leaving the door open for others to comment on it. Feezo (send a signal | watch the sky) 10:05, 1 June 2011 (UTC)[reply]

The Practice of Unscientific Research

This is largely a direct response to Qwyrxian's little (big) block.

I would like to stress (as I did before) that the relative usage of names falls into the domain of linguistic research. In the pursuit of the most common name of an entity, one has to consider a number of issues. First of all, there's the matter of deciding what collection of text (also known as corpus) should be representative to the true distribution of English words in real life. In choosing an appropriate corpus one needs to consider the following in regards to the constituent documents:

  • Publish date
  • Publish locations
  • Genres (fiction, newspaper, encyclopedia, blog posts, e-mails, government documents, etc)
  • Authors
  • Sources

While the use of encyclopedia may sound appropriate, it is in fact a rather bad choice because the number of authors involved is very small (for those who have a basic understanding of statistics, this type of small sampling is usually frowned upon). An additional concern is the time stamps of these encyclopedias. Since encyclopedias found in libraries are often quite dated, they may not be of much use for our situation (i.e. to examine the relative usage frequency of words in present time). Finally, it may sometimes escape an individual that encyclopedias are far from accurate representations of the English language. Despite their relative authority on some facts, the text within are not more "English" than the text found in other genres (such as blog posts, personal letters, CNN/BBC/Al-Jazeera articles and such) and thus should not be granted disproportionate emphasis.

For those who have some training in linguistics, they may think "Hey, let's use the American Brown corpus then! This solves all problems". Now, while the use established corpora is a standard in linguistic research, it is not applicable to this circumstance. The reason being the outdatedness of many documents in these established corpora.

This leaves the "web-corpus" (a.k.a search engine results) being the only remaining easily-accessible corpus and the one endorsed by Wikipedia. The advantage of using is that it contains many documents from a vast distribution of genres (thus many types of data points). On the other hand, it is still not fool-proof due to a number of factors:

  • Blackbox search engine heuristics
  • Different search parameters yield different results
  • Potential unpredictability of search engine hits
  • Duplicate documents (thus over-representation of some data points)
  • Genre bias (i.e. scarcity of personal letters)

With this said, I'd point out that there isn't a perfect way of finding an answer to our problem and that we should be mindful of these issues when making decisions.

Now, let's talk about something different: How do we actually decide which is the more common name? While an everyday person may say "Let's choose the term with the most frequent hits!", the problem is that it doesn't work when there exists a set of appropriate analyses (in relative terms) with conflicting results. For such cases, the identity of this most common term can be ambiguous. A potential solution is to formulate some sort of statistical test to assess whether or not there is a statistically significant enrichment of one term over all others (Fisher's exact test comes to mind). However, a compounding issue with our data is that both Diaoyu and Senkaku often have very close number of hits. As a result, most statistical tests would probably not deem the difference between their observed frequencies of usage to be statistically significant.

Now, suppose we actually can't decide which name prevails over the other in terms of frequency of usage, there's always the option of using dual names. While this solution is vehemently opposed by User:Qwyrxian, the philosophy behind his disagreement can be reasonably disputed. He said "Dual names (i.e., Senkaku/Diaoyu Islands) are a bad idea because they don’t actually solve anything, since the discussion just shifts from “what should the name be” to “which name should come first”. ". This statement neglects one of the central dogmas of Wikipedia that is to move content as close to NPOV as possible (although what is NPOV is really subjected to our subjective interpretations). So while he can be right about the possibility of continual disputes on the ordering of name, it appears he has overlooked the net-benefits of adopting a dual-name solution.

And of course, my personal inclination is that it's preferable to let two kids share the TV than allowing one to monopolize it. Sure, they can still fight over who uses it first, but at least it is better than holding an undeserving bias against any of them. --Bobthefish2 (talk) 08:08, 1 June 2011 (UTC)[reply]

In case people would start accusing me of calling User:Qwyrxian unscientific (by making one of infinite interpretations of the sub-title), I'd pre-emptively deny this seemingly potential message that exists in the realm of infinite possibilities. Thanks. --Bobthefish2 (talk) 08:18, 1 June 2011 (UTC)[reply]
If a dual name is practicable on Wikipedia, I would suggest the local names to be placed in alphabetic order of the languages (Chinese/Japanese), so the dual name would be "Diaoyutai Islands/Senkaku Islands". STSC (talk) 14:37, 1 June 2011 (UTC)[reply]
If we had to go this route (which I don't support, because of the reasons mentioned in the guideline), I would say we still have to do the hard work of figuring out which order is most common in sources. Just because we're switching to a non-preferred style doesn't mean we can ignore the overall requirement that we go with what the sources say, not with what we prefer/want. I can think of a dozen other reasons to choose one order or the other (Senkaku first since Japan currently controls the Islands, Diaoyu first because China is the first recorded country to have spotted them, Senkaku first because Japan was the first country to inhabit them, Diaoyu first because D comes before S, etc., etc.). This is not to say that these are all good reasons, but that choosing a dual name isn't a simple solution, and I also believe that it simply prolongs the conflict for when the next group of new editors shows up. Ugh, and I just realized something I need to bring up in a new section. (Diaoyu vs. Dioayutai) Qwyrxian (talk) 02:17, 2 June 2011 (UTC)[reply]
I find it difficult to understand your perspective. Suppose the two names have a difference in usage that is not statistically significant, then how is it going to improve the situation by strictly favouring one of the two names? Sure, you can cite technicalities from Wikipedia and attempt to suppress further dissent through that, but conceptually, it's worse and will no doubt still be a source of future conflicts and edit-wars. As for name-ordering, it's a much less contentious matter because in such a case both names are feature in an almost equivalent manner (personally, I don't think it matters at all to me).
If you still fail to understand the advantage of adopting a dual name, I'd encourage you to think of our problem as a line fitting problem. Suppose the relative frequencies of the Japanese names are ~0.5 and the relative frequencies of Chinese names are ~0.5, then what's the line of best fit (with the lowest least square error)? Will it be the vector (0,0) -> (1,1) a.k.a. Japanese name solution? Or will it be the vector (0,0.5) -> (1,0.5) a.k.a. dual-name solution? I doubt you'd have trouble understanding this because this is just high school algebra. --Bobthefish2 (talk) 02:30, 2 June 2011 (UTC)[reply]
If we could show that the difference in usage (and this includes all measurements of usage--scholar/news searches, encyclopedias, whatever else we deem relevant under NCGN) is not stastically significant, then yes, in that case, there is a good sense in using either a dual name or "Pinnacle". The problem, of course, is that I'm not sure how we'd measure "statistically significant", given that we can't even figure out how to do the original measurements, the different criteria vary so widely, and some of the populations are so small (you can't show, for example, whether having, hypothetically, 2 out of 3 major encyclopedias use only one name is statistically significant), that I'm not sure we could determine that. In other words, what I'm saying is that we have to be careful when we suddenly start to apply specific statistical/scientific concepts (chi-squared tests, confidence intervals, etc.) to guidelines that are intentionally unscientific and vague? That is, I'm saying that, in my opinion, we can never have enough confidence that the difference is insignificant to override the (what seems to me to be very reasonable) suggestion to not have dual names. But, to clarify, this is not a line of thinking that I consider closed; however, I think it's one that we should turn to only after we've exhaustively shown that data and policies aren't getting us where we need to go. Qwyrxian (talk) 02:43, 3 June 2011 (UTC)[reply]
Statistical tests are pretty standard ways to show something's very unlikely arise due to noise, so you should not perceive them as some sort of "fringe" philosophy that is meant to be vague or something. And you are right about the need to use them properly, which is something I advocated. For example, a statistical test on your favourite encyclopedia results is useless. And of course, if there isn't enough confidence to rule that the different is insignificant, chances are we don't have enough confidence to rule the converse either. In the face of such uncertainty, is it right to so casually allow only one of the two names. --Bobthefish2 (talk) 04:31, 3 June 2011 (UTC)[reply]

"Senkaku Islands" is NOT a English name but a POV name/title

Sorry for my delayed input as I am really badly busy. Qwyrxian in his/he summary statement listed several relevant WP policies and guidelines including WP:Article titles and its Considering title changes; WP:NPOV and its WP:NPOV#Achieving neutrality and its Wikipedia:Naming conventions (geographic names). I of course agree on these. The problem is not here. The problem is how to apply these policies and guidelines on the name you like to choose. Here I make a brief summary for Qwyrxian's points as his/her rational to choose "Senkaku Islands" as a so called NPOV name/title:

  1. "Senkaku Islands" is a English name for these Islands;
  2. "Senkaku Islands" is the name mostly used in English;
  3. Based on the 1 and 2 above, then following listed WP policies and guidelines,"Senkaku Islands" should be the name/title chosen for those pages in en:Wikipedia.

My disagreement or dispute is:

  1. "Senkaku Islands" is NOT a English name for these Islands;
  2. "Senkaku Islands" is NOT the name mostly used in English;
  3. Therefore, "Senkaku Islands" is NOT a NPOV name/title being suitable for these pages in en:Wikipedia.

I agree with Bobthefish2 on his main points in his "the Practice of Unscientific Research". Here I present the results I got from online search (done on June 1, 2011, 20:47 UTC, at a location where English is the official language): Table1

Original Language Name/Title Name Form in Original Language Google Search Google Scholar Google Book Google News Year when name generated Selected Reliable Sources
Diaoyu Islands 钓鱼岛群岛 344,000 1,501 5,370 25
Diaoyutai Islands 釣魚台列嶼 64,100 779 2,420 7
Chinese (Subtotal) (as above) 408,100 2,289 7,790 32 as early as 1403 [1]
Japanese Senkaku Islands 尖閣諸島 221,000 2,660 12,100 34 1900 [2]
English Pinnacle Islands Pinnacle Islands 750,000,000 22,200 27,800 27 1843 [2][3][4]
  1. ^ Shun Feng Xiang Song (順風相送)/Voyage with the Tail Wind, A Chinese navigation records, is now located in Bodleian Library, Oxford, UK 35 H.
  2. ^ a b Martin Lohmeyer (2008). The Diaoyu/Senkaku Islands Dispute
  3. ^ Han-yi Shaw (1999). The Diaoyutai/Senkaku Islands Dispute:Its history and an analysis of the ownership claims of the P.R.C., R.O.C. and Japan
  4. ^ Belcher, Edward and Arthur Adams (1848). Narrative of the Voyage of H.M.S. Samarang, During the Years 1843–46: Employed Surveying the Islands of the Eastern Archipelago. London : Reeve, Benham, and Reeve. OCLC 192154

The Chinese name or names and the English name were independently generated on their own languages per se. The Japanese name was derived from English name "Pinnacle Islands" by the Japanese explorer Tsune Kuroiwa in 1990 who translated the British English name with its meaning, not its phonetic pronunciation, into Japanese language “尖閣諸島”. “Senkaku Islands” is this Japanese name expressed in the way with English (or Roman spelling) alphabetic letters, with only changing "Shoto" for "諸島" into "Islands", like Chinese name "Diaoyu Islands" instead of "Diaoyu Qundao". Since then, “Senkaku Islands” has been a Japanese name but expressed in English way. It is absolutely NOT a English name at all. When you go any serious and reliable sources, this name “Senkaku Islands” is always noted as "the Japanese name" or said "Japan calls these islands as ...". This is for my point 1.
For my point 2, I ever said in my input here that If this Japanese name cannot pass any one of those challenges like listed in the above table, this name cannot deserve "the name mostly used in English". Qwyrxian did not answer this in his/her summary.
Then my point 3: agreed with STSC, Bobthefish2, and Phead128, and based on my above two points, it can be concluded that the "Senkaku Islands" is absolutely a POV name/title. It shall definitely NOT be used as the name/title for those pages in en:Wikipedia.
The next question is: what we should choose for a NPOV name for these pages? I originally only considered to use dual name. But with my participating in more discussions and did more researches, now I think the suggestion Phead128 raised and STSC supported shall not be rule out indiscreetly. (to be continued, sorry too busy) --Lvhis (talk) 00:07, 2 June 2011 (UTC)[reply]

I'd have to object your analysis (just as I brought scrutiny to User:Feezo and User:Qwyrxian). You obviously have not used quotations in getting your Google search results. For example, the 7 million hits you got for Pinnacle Islands is inaccurate because the absence of quotations sensitizes Google to irrelevant pages (such as "Pinnacle Rock" ... [lots of words] ... "Fernandina Island"). When I redid your search with quotations, I instead received 17500 hits, which is less than either of the names.
I'd also have to add that having the most count does not necessarily mean a name is most commonly used. Rather it just implies that name most commonly occurred in the small sample of documents chosen for analysis. With enough noise , the result can in fact be quite insignificant in a statistical sense. Hence, we should be careful about adopting a first pass in such cases. --Bobthefish2 (talk) 02:01, 2 June 2011 (UTC)[reply]
I have to concur with Bobthefish2 on this, and admit that this is the problem with all Google searches: unless they are done very carefully, the results are flawed. You need to search for "Senkaku Islands", "Diaoyu Islands", and "Pinnacle Islands", including the quotation marks, not Senkaku Islands, etc. You also need to use Advanced options to set "english" as your language, because lots of foreign language websites include English keywords, even though they have no actual English text. Then, you actually need to look at your results, especially for the third result; for instance, when I just a straight Google Search for "Pinnacle Islands", the 7th result is [1], which is about the Pinnacle Islands in Canada, not the ones in the East China Sea. I'm sure I can find our most recent results somewhere in the archives, which were done with a fair amount of precision. Alternatively, I am willing to complete new careful searches as well. One issue we have to deal with on news sources, for example, is the fact that even though you do a Google News Search in English, many of the hits you get will be for English language newspapers in China and Japan, which, not surprisingly, tend to use the term preferred by the nation-state they represent. Final note on Google searches--one thing you need to not do with Google searches is to use the "NOT" search marker, because it completely alters the results (usually, you get more results by adding a "NOT" than without one, which, if Google were a pure, simple search of all online materials, wouldn't happen). And finally, I would call, one more time, for anyone with access to an English library to try looking at English print encyclopedias, something we are specifically told to do by the guidelines. Qwyrxian (talk) 02:11, 2 June 2011 (UTC)[reply]
Qwyrixan, you're not in a position to advise people not to use "NOT" just because you don't know how to use the search terms correctly on Google; you obviously have very limited knowledge in Information Retrieval, therefore, most of your finding in Google hits would be quite unreliable. STSC (talk) 04:44, 2 June 2011 (UTC)[reply]
I did use the terms correctly, although I didn't explain it as well here as I should have. Do a search for "Senkaku Islands". I currently got 206,000 hits. Then, go to advanced search, and add "Diaoyu" in the section that says, "But don't show pages that have...any of these unwanted words". That produces 305,000 hits. It cannot actually be the case that there are more pages that include "Senkaku Islands" but not "Diaoyu" than simply include "Senkaku Islands", since the former is a subset of the latter. If Google actual produced "true" results, any search of "A" should always produce more hits than "A -B". As an analogy, just to make this clear: lets say I look around the room I'm in, and count "The number of men". That number must always be equal to or less than "The number of men older than 40" or "the number of men not older than 40." Any time you add more specific criteria, you must produce fewer results (or equal), not more. That shows that the Google is not actually producing a complete/common sense set of results. Google Books and Google Scholar seem to work better in this regard (my guess is that it has to do with the way page ranking works on normal Google, which I don't think works in the same way on the specialized engines, but I'm not actually sure). Qwyrxian (talk) 04:58, 2 June 2011 (UTC)[reply]
IR is not like arithmetics, that's why I said you really have very limited knowledge in IR. This is off-topic now so I shall not continue. STSC (talk) 05:07, 2 June 2011 (UTC)[reply]
If it's not the same, you need to explain it to us, because a large portion of our discussion hinges on how we use search results. I suppose we could agree not to use search results, but, if so, then we need another way to determine the most common English name in high quality, encyclopedic sources. I am, of course, happy with using the results found from the almanacs, but I assume that won't work for y'all. The policies in question require us to determine the most common name; we cannot shirk that responsibility, nor can we simply assert "Modern sources now use X" without evidence. Qwyrxian (talk) 05:36, 2 June 2011 (UTC)[reply]
The policy in question can be written by anyone and with or without "consensus". Most likely, the few who wrote or last edited these relevant sections have absolutely no backgrounds in linguistics, natural language processing, and information retrieval. As for explanation on how IR works, I guess STSC can offer you a few pointers. However, the block of text I wrote in my section is also relevant. --Bobthefish2 (talk) 06:08, 2 June 2011 (UTC)[reply]
My point is Qwyrxian should not advise other editors not to use "NOT" in Google search when he has limited knowledge in IR. STSC (talk) 06:37, 2 June 2011 (UTC)[reply]
I only have a very basic understanding of IR (well, mostly theoretical). What's wrong with using NOT? --Bobthefish2 (talk) 07:10, 2 June 2011 (UTC)[reply]
I agree, we're hitting a brick wall here. STSC, will you please explain what is wrong with Qwyrxian's results beyond saying "you obviously have very limited knowledge in Information Retrieval"? Feezo (send a signal | watch the sky) 07:12, 2 June 2011 (UTC)[reply]

If the "Senkaku Islands" is NOT an English name for these Islands, the "Diaoyu Islands" and the "Diaoyutai Islands" are also NOT English names. So are the most of place names of China and Japan. Are they not NPOV names? I'd like to point out this thread is based on a wrong premise and I'm afraid it's worthless to me. Oda Mari (talk) 08:00, 2 June 2011 (UTC)[reply]

See WP: Article titles#Foreign names and anglicization. It says "Names not originally in a Latin alphabet, such as Greek, Chinese, or Russian names, must be transliterated. Established systematic transliterations, such as Hanyu Pinyin, are preferred. However, if there is a common English-language form of the name, then use it,..". It is irrelevant whether it is a pure English name or not.―― Phoenix7777 (talk) 08:32, 2 June 2011 (UTC)[reply]

--(Continuing that on 00:07, 2 June 2011 UTC) There have been so many comments above already. Per these comments, including Qwyrxian's ones, I think I can draw a conclusion from these that we both sides have reached one consensus here: the name "Senkaku Islands" is not a English name. I have provided reliable sources to support this per what one of our WP guideline WP:RS requires. If any one still want to say "Senkaku Islands" is the English name for these islands, that for sure comes from his/her original research which our WP cannot accept. For responding Oda Mari, I'd like to repeat what I have said, Neither the Japanese name "Senkaku Islands" nor the Chinese name "Diaoyu Islands" for these disputed islands is a NPOV name/title in en:Wikipedia. Now I go next one again, Is "Senkaku Islands" the name mostly used in English for these islands? I presented my search results in the table above. The numbers there are raw data, rather than an analysis yet. I know the limitations of such searches but I believe these still have meaningful significance. I also agree that it is quite reasonable for what Bobthefish2 and Qwyrxian challenged against the numbers for "Pinnacle Islands". Let us put the raw data about "Pinnacle Islands" aside temporarily. Compared with the Chinese name, if you want to define the Japanese name as "the name mostly used in English", you still have to have that name pass through or over all of challenges, including such raw searches. Qwyrxian once ever mentioned me an example or precedent in en:Wikipedia, the name/title for Italian city Florence. Let us see the raw data for this Italian city (done on June 2, 2011, 22:17 UTC): Table2

Original Language Name/Title Google Search Google Scholar Google Book Google News
Italian Firenze 96,500,000 646,000 2,430,000 9,380
English Florence 168,000,000 1,120,000 6,560,000 11,900

See, it is no doubt here the name "Florence" is "the name mostly used in English"! No matter how raw the search ways are, this is a good example for us. Now come back to our case the Japanese name vs the Chinese name, If use of the Japanese name really outnumbers use of the Chinese name, in English, you will not need bother to find and use more complicated or sophisticated ways or methods to pinpoint the so called accurate difference between them. The real objective fact here is: the usage of these two single names in English is very, very close, or roughly I would say it is same. We are working on this Wikipedia, the Free Encyclopedia which is not an official document of USA government. So I agree with Bobthefish2, the usage of the name in .gov and Library of Congress Subject Headings is just one of the many usages, though quite official, which still cannot kill the popularity of any one of these two names. Using single name "Senkaku Islands" is same to using single name "Diaoyu Islands" here, which is a POV name/title. (to be continued) --Lvhis (talk) 00:52, 3 June 2011 (UTC)[reply]

Except for the fact that the guidelines specifically tell us not to look at sources in general, but to look at "neutral and reliable sources". This is the value in looking, not only at popularity, but also at reliability and neutrality (i.e., the quality of sources) as we consider which name is more common. Also, while I fully accept that Florence/Firenze is a good analogy, I believe it is not acceptable for you to define it as the standard by which we measure the current naming issue. If that were the case, then the guidelines would say that in all cases where its even somewhat close, we should choose some random "English translation" or use a dual name; but, in fact, the guidelines tell us that we should do almost everything possible to find the most common English name; Instead, the guideline explicitly says, "We recommend choosing a single name, by some objective criterion, even a somewhat arbitrary one" when choosing between multiple local names. Qwyrxian (talk) 02:49, 3 June 2011 (UTC)[reply]
Also, I want to follow up on a point Oda Mari raised earlier, again, quoting from NCGN: "When a widely accepted English name, in a modern context, exists for a place, we should use it. This often will be a local name, or one of them; but not always." In other words, we cannot reject Senkaku as the name simply because it is a transliteration of the Japanese name; we must, as always, focus on whether that transliteration is the most common English name. Qwyrxian (talk) 02:52, 3 June 2011 (UTC)[reply]
As for the "Firenze/Florence" search, it's too rough. See Florence (disambiguation). They are all included in the search. As for the "Pinnacle Islands" search, it's also too rough. Here are other results. "Pinnacle Islands" China -Wikipedia hit 11,400. "Pinnacle Islands" Japan -Wikipedia hit 15500. But they include Diaoyu and Senkaku. "Pinnacle Islands" China Japan -Wikipedia -Senkaku -Diaoyu -Diaoyutais -Senkakus hit only 487. I think "Pinnacle Islands" is inappropriate as the article name of the islands per common names and Non-neutral but common names. I suggest you to change the thread title as "Neither "Diaoyu Islands" nor "Senkaku Islands" is an English name but a POV name/title" if it is what you mean. Oda Mari (talk) 10:02, 3 June 2011 (UTC)[reply]

--(Continuing that on 00:52, 3 June 2011 UTC) Firstly I am responding Oda Mari's suggestion. The reason I use that title for this section is because that Qwyrxian's "Summary statement ..." in fact implied that the Japanese name "Senkaku Islands" is a English name and may mislead audience here to believe so. No one here ever mentioned the Chinese Name "Diaoyu Islands" was a English name or a NPOV one. And what has led us to come here for mediation is whether the current name/title for those pages is POV one or NPOV one. I am on the side arguing for and proving that it is POV one. Before I gave my revised raw searches on these islands per critiques from Bobthefish2, Qwyrxian, and you Oda Mari, I would like to clarify that I neither use the results from this kind of search to finally determine the name/title, as STSC warned, nor treat these just as a bunch of junk. I almost 100% agree with Bobthefish2 on his section "The Practice of Unscientific Research" after completely reading it. Now here is the revised table (done on June 3, 2011, 18:23 UTC, using Google Advanced Search by adding East China Sea in "one or more of these words" and choosing English in "Language" option): Table3

Original Language Name/Title Name Form in Original Language Google Search Google Scholar Google Book Google News Year when name generated Selected Reliable Sources
Diaoyu Islands 钓鱼岛群岛 108,000 626 1,310 7
Diaoyutai Islands 釣魚台列嶼 21,700 797 427 0
Chinese (Subtotal) (as above) 129,700 1,423 1,737 7 as early as 1403 [1]
Japanese Senkaku Islands 尖閣諸島 89,700 989 3,110 8 1900 [2]
English Pinnacle Islands Pinnacle Islands 2,160 23 10 0 1843 [2][3][4]
  1. ^ Shun Feng Xiang Song (順風相送)/Voyage with the Tail Wind, A Chinese navigation records, is now located in Bodleian Library, Oxford, UK 35 H.
  2. ^ a b Martin Lohmeyer (2008). The Diaoyu/Senkaku Islands Dispute
  3. ^ Han-yi Shaw (1999). The Diaoyutai/Senkaku Islands Dispute:Its history and an analysis of the ownership claims of the P.R.C., R.O.C. and Japan
  4. ^ Belcher, Edward and Arthur Adams (1848). Narrative of the Voyage of H.M.S. Samarang, During the Years 1843–46: Employed Surveying the Islands of the Eastern Archipelago. London : Reeve, Benham, and Reeve. OCLC 192154

And also I have a table for another precedent many editors mentioned, the "Liancourt Rocks" (done on June 3, 2011, 17:26 UTC, adding Shimane after Takeshima Islands as disambiguation because there are five different groups of Islands called this same name in Japan): Table4

Original Language Name/Title Name Form in Original Language Google Search Google Scholar Google Book Google News Year when name generated
Dokdo Islands 108,000 2,030 1,540 36
Tokdo Islands 108,000 308 1320 1
Korean (Subtotal) 독도/獨島 216,000 2,338 2,860 37 ?
Japanese Takeshima Islands, Shimane たけしま/竹島 131,000 290 929 1 ?
Franco-English Liancourt Rocks Le Liancourt 126,000 1,160 2,500 0 1849

Per the revised table regarding the Diaoyu/Senkaku Islands, it will be more unreasonable if you call the Japanese name as "the name mostly used in English" for these islands. The true fact behind this is the difference between these two names in English usage is very, very small. As Bobthefish2 said in his section "As a result, most statistical tests would probably not deem the difference between their observed frequencies of usage to be statistically significant." Alternatively speaking in statistical term, the observed difference is due to sampling error, and of course it cannot overcome searching engines errors. If you really want to say which one is used more, you can only say "The Japanese name Senkaku may, probably, somewhat, and in certain point of view, be little bit more used in English" when compared with the use of the Chinese name. Therefore, the Japanese name just has same credit and privilege as the Chinese name has when you consider to use as the name/title in en-WP pages.

The last, what name should we choose as a NPOV one? My suggestion is that depends on what kind of consensus we can reach. If the editors in one side love the Japanese name very much, and so much that they feel very uncomfortable if the name/title does not include it, then we have to go dual name, Diaoyu/Senkaku or Senkaku/Diaoyu, although I like the D/S one as STSC proposed very reasonably, but I can compromise as Bobthefish2 said. If those of that side dislike the Chinese name very much, and so much that they cannot tolerate the Chinese name appearing on the title, and they like to emphasis that the guideline discourages using dual name, then we have to go the real English name, "Pinnacle Islands", as what has been applied for "Liancourt Rocks" shown in Table 4 above. Anyway, my another suggestion is we shall avoid being "penny-wise and pound-foolish" when we deal this name/title issue in respect of WP:NPOV. (done) --Lvhis (talk) 23:28, 3 June 2011 (UTC)[reply]

Lvhis, the figures on Liancourt rocks searches above are incorrect. If you want other users to talk more on this thread, use correct figures and the links. None of your posts based on the searches on this thread are convincing. BTW, I don't think the names D and S are not NPOV. Oda Mari (talk) 16:45, 4 June 2011 (UTC)[reply]

Diaoyu vs. Diaoyutai

I have consistently, when doing searches, looking for both Diaoyu and Diaoyutai, and often including other spellings as well (Xiaoyutai? I'd have to look at the archives to be sure). My understanding, and please correct me if I am wrong, is that the PRC prefers "Diaoyu" and ROC prefers "Diaoyutai". I had thought that Diaoyu is the more common term when considering Chinese variants, but I see STSC consistently using Diaoyutai. Is this a personal preference? Do we need to account for Diaoyutai? That is, is this a "three-way" consideration (or even more)? Or is it "Chinese name vs. Japanese name vs. dual name vs. Pinnacle", and then after we figure that out, then we figure out Diaoyu or Diaoyutai? Note that if others think that this is an issue that we would be better off putting off discussing until we make more headway on the main issue, that is fine by me; it just suddenly occurred to me when I responded to STSC above. Qwyrxian (talk) 02:21, 2 June 2011 (UTC)[reply]

The difference between Diaoyu and Diaoyutai is like the difference between U.S. and U.S.A. The other variants exist due to a non-conformity in phonetic translation of Chinese words. But in the end, they all point to the same Chinese words (+/- the "tai") --Bobthefish2 (talk) 02:44, 2 June 2011 (UTC)[reply]
This is absolutely a non-issue. I would use "Diaoyu" just to please Qwyrxian. STSC (talk) 04:10, 2 June 2011 (UTC)[reply]
That's fine, I was simply confused as to whether this was a distinct, relevant issue or, as Btf2 puts it, simply one of transliteration. Thanks. Qwyrxian (talk) 04:52, 2 June 2011 (UTC)[reply]
Then may we exclude "Diaoyutai" for the forthcoming discussion for simplicity? Aren't any of you believing the sum of "Diaoyu" and "Diaoyutai" make sense? ―― Phoenix7777 (talk) 08:52, 2 June 2011 (UTC)[reply]
I haven't seen any arguments here for Diaoyutai, so I don't believe any party would be marginalized by excluding it. If this is wrong, please let me know. Otherwise, I agree with Qwyrxian that the debate encompasses "Chinese name vs. Japanese name vs. dual name vs. Pinnacle" with Diaoyu as the Chinese name. Feezo (send a signal | watch the sky) 09:21, 2 June 2011 (UTC)[reply]
As I've said, Diaoyutai is the same as Diaoyu just as United States is the same as United States of America (unless there is another United States of Something in this world). So, I think Diaoyutai or Diaoyu will not make a difference. --Bobthefish2 (talk) 17:38, 2 June 2011 (UTC)[reply]

Google search

I have set up this separate heading out of respect for Lvhis' section above.
First of all, I'm absolutely not happy with Qwyrxian giving silly advise like this,"one thing you need to not do with Google searches is to use the "NOT" search marker..." just because he has had difficulties in using Google search. I would presume good faith but I hope it is not his tactic trying to dismiss other editors' search data which uses the NOT operation (as he has done so in the past). So please Qwyrxian give us a search example so that we can work out why you're not getting a logical result by using the NOT operation. STSC (talk) 04:14, 3 June 2011 (UTC)[reply]

I don't think Qwyrxian has really said anything to deserve personal attacks, so you don't really need to go overboard with your descriptors. As for his advice about not using "NOT", there was an example he has made in the past that I've been able to reproduce where using the NOT descriptor yielded more hits than not using the NOT descriptor. Personally, I don't know enough about Google's search mechanism to understand the rationale behind this phenomenon. If you do, then perhaps you should indulge us on why this is the case. --Bobthefish2 (talk) 04:40, 3 June 2011 (UTC)[reply]
There's no "personal attack" at all. STSC (talk) 04:55, 3 June 2011 (UTC)[reply]
If someone plays with matches and gets his fingers burnt, he cannot go on to tell everyone stop using matches. STSC (talk) 05:21, 3 June 2011 (UTC)[reply]
That's not the point. While there are significant issues with the way he naively draws his conclusions, he did raise a legitimate point on this which is backed by evidence that shows this search parameter does not behave as expected. If you would like to show he is wrong, then you should demonstrate why (which I assume you have the expertise to). Otherwise, this is not going to convince anyone of your position. --Bobthefish2 (talk) 05:46, 3 June 2011 (UTC)[reply]
STSC, if you can enlighten us about how to properly use "unwanted words" on the Google advanced search in a way that produces proper results, I would be more than happy to amend my statement, and also to include such types of data in our analysis. I've given a bunch of examples so far, but I'll give one more totally unrelated to this subject. Search for "online game" (using quotation marks); I get 47.7 million hits. Search for "online game" -multiple (or go to the advanced options and put multiple into the unwanted words box); I get 124 million hits. It simply cannot be that there are more pages with "online game" but not "multiple" than there are pages with "online game". Something is amiss. Now, if there is a way to fix this, we should do so. One thing I note is that the same problem does not occur when the positive search string is only a single word (so, "love" gets, as it should, far more hits than "love" -hate). But that doesn't tell me how to fix it. Please understand, I am not using this as a tactic; in fact, I really wanted this to work in the first place because it would have saved be, literally, hours worth of work when I first tried to separate articles that mentioned both names from those that mentioned only one. If we could get this search result to work reliably, I think it would definitely help us (not definitively, but it would be a step in the right direction). Qwyrxian (talk) 05:59, 3 June 2011 (UTC)[reply]

According to Wikipedia:Article titles#Common names: "Search engine results are subject to certain biases and technical limitations; for detailed advice in the use of search engines and the interpretation of their results, see Wikipedia:Search engine test" --Tenmei (talk) 06:25, 3 June 2011 (UTC)[reply]

Yes, we already know and even discussed a bit on that. Thanks for pointing out. --Bobthefish2 (talk) 06:57, 3 June 2011 (UTC) 06:56, 3 June 2011 (UTC)[reply]
The result count of a Google search is only a quick estimate and often very unreliable (and wrong) particularly for large results, so we should not read too much in it. The search engine's priority is to retrieve the relevant results quickly, and a normal user is likely only interested in the first few pages anyway. The hit counts in Qwyrxian's case are either underestimated or overestimated. That's why I myself have never supported the use of Google search to determine the title. STSC (talk) 09:12, 3 June 2011 (UTC)[reply]
I have no qualms with that analysis or that plan, at least for regular Google. Maybe it's just my feeling, but it seems like I/we get better (more accurate?) results with Google News, and even better Google Scholar, and Google Books. One possibility is simply that the smaller number of objects to be searched decreases the volatility. Another possibility (perhaps a bit more likely) is that Google uses a simpler search for Scholar and Books, since ranking and advertising aren't as relevant there. Qwyrxian (talk) 09:20, 3 June 2011 (UTC)[reply]
These are all search engines. The principles of IR applies to them just as well. --Bobthefish2 (talk) 15:05, 3 June 2011 (UTC)[reply]

I think a new format might help direct this discussion a bit. We all know that measuring search engine results is a flawed metric that relies on a black box; but at the same time we've seen valid criticism of using encyclopedias, almanacs, and similar references. Both are recognized by NCGN, so we have a degree of freedom here. We can discuss ways to perform and refine these two options, but I think it would be helpful to first sort out which are acceptable in general to the participants here. To that end, I'd like to have a mini RFC-style discussion on the merits of the major options and sub-options available. For anyone who's not familiar with the format, editors write statements under section headings, which are then endorsed by other editors, who may also elaborate on them. You can endorse as many positions as you like. Disagreement shouldn't be voiced under those headings, but should be placed under new headings which other editors may then endorse. Discussion and clarification may of course continue outside this format. Here are some headings which may be used "as is", or rewritten. If you don't want to repeat your position, just note that your reasons are given above. Again, the objective here is to decide which of the NCGN widely accepted name criteria we should use, and to determine the general scope of the one we select (e.g., which encyclopedias, or which search engines. Details like how to properly perform a NOT search should be avoided here, if possible). Feezo (send a signal | watch the sky) 19:00, 3 June 2011 (UTC)[reply]

Feezo. Are you going to ignore the guideline and try to change the criteria among limited participants here without discussing at the talk page of the guideline? The guideline is a wisdom as a result of many dispute resolutions. If anyone object the current criteria of the guideline, he/she should discuss at the talk page of the guideline. Please withdraw the polling below. ―― Phoenix7777 (talk) 20:41, 3 June 2011 (UTC)[reply]
Not at all. Quoting from the linked guideline:

A name can be considered as widely accepted if a neutral and reliable source states: "X is the name most often used for this entity". Without such an assertion, the following methods (not listed in any particular order) may be helpful in establishing a widely accepted name (period will be the modern era for current names; the relevant historical period for historical names):

Do we currently have a neutral, reliable source that says "X is the name most often used for these islands"? If not, we are free to develop a consensus on how to use the six methods given in the guideline. Feezo (send a signal | watch the sky) 20:52, 3 June 2011 (UTC)[reply]
The quote doesn't refer anything about changing the criteria. It just says "the following methods (not listed in any particular order) may be helpful in establishing a widely accepted name". Yes we can free to develop a consensus, however it is unlikely to build consensus among disputed participants. Otherwise, this mediation will never end. Please stop bringing the polling all of sudden. ―― Phoenix7777 (talk) 21:18, 3 June 2011 (UTC)[reply]
Can you explain what you mean by saying that I'm trying to change the criteria? What is your reading of Wikipedia:NCGN#Widely accepted name? Also, can you please explain what you mean "it is unlikely to build consensus among disputed participants"? Feezo (send a signal | watch the sky) 21:31, 3 June 2011 (UTC)[reply]
The polling below actually is proposing the change of current criteria described in the guideline. You would understand how it is difficult to become a consensus, by just considering this long standing dispute. ―― Phoenix7777 (talk) 21:39, 3 June 2011 (UTC)[reply]
I disagree with Phoenix7777; the polling is not an attempt to change the guideline. The guideline itself says that there are a variety of ways, that these are not the only ways, and that you needn't use all of them or any one in particular. In other words, the guidelines are (like many of our guidelines) intentionally vague. Feezo isn't trying to change the guideline, he's trying to figure out how to apply the guideline to our particular case. Now, it's perfectly fine for someone to say "Apply the guideline as written, in order, looking at all of the criteria", but it's also acceptable to find a balance in how we use the different criteria. I'm going to hold off on giving my opinion at the moment (I'm disappearing soon for a 2 day wikibreak, and want to think about it a bit). Qwyrxian (talk) 21:47, 3 June 2011 (UTC)[reply]
Disagree. "Do not support using Google as a metric" is clearly against the guideline. And I don't want to discuss here how Google web search is unreliable. This mediation is not the place to discuss such an issue. Such an issue was already discussed at the guideline and as a result of the consensus current searching criteria (Book, Scholar) is described. However I agree to discuss something like addition of some encyclopedia such as "Encyclopedia Americana" or almanacs to the criteria. ―― Phoenix7777 (talk) 21:55, 3 June 2011 (UTC)[reply]

It is not against the guideline. I quote, "The following methods...may be helpful..." (emphasis added.) The key word is "may". It gives us a choice. Feezo (send a signal | watch the sky) 22:42, 3 June 2011 (UTC)[reply]

Yes, you are right. However, are you sure this option is practical idea? ―― Phoenix7777 (talk) 22:50, 3 June 2011 (UTC)[reply]
We have not examined all the criteria listed in the guideline yet, especially the correct Google Book/Scholar test. It is too early to make this kind of polling without discussing the result of such an important test. ―― Phoenix7777 (talk) 23:06, 3 June 2011 (UTC)[reply]
It's not really a poll; the section headings are just examples. You can take one and modify it, or write your own. I'm starting with the first three criteria, since they've already had the most discussion. You can write your own section, as long as it addresses Wikipedia:NCGN#Widely accepted name. The section statements should be fairly short; I'm thinking around one hundred words. The purpose isn't necessarily to convince people, but simply to establish the positions. If there isn't a clear consensus, we can then explore the arguments in depth. If there isn't any activity here in the next day or so, I'll write the statements myself, summarizing the arguments from all sides as I understand them. Feezo (send a signal | watch the sky) 23:42, 3 June 2011 (UTC)[reply]
I don't argue with you on this anymore. I will be watching this discussion. My only hope is this discussion will not be a divergence of discussion but a convergence of discussion as I explained my concern above. And actually this discussion became a divergence of discussion as my concern. Please discuss completely the use of Google search issue here from the beginning without any consideration of the past discussion of the guidelines. If any result will come out of this discussion, I will not agree with that unless the relevant guideline is amended. In other words, there will not be a consensus.―― Phoenix7777 (talk) 11:44, 5 June 2011 (UTC)[reply]
Freezo removed my edit above and I restored it. This is not an off topic discussion but I contested the proposal of poll because such a poll is only recurrence of discussion of the guideline. ―― Phoenix7777 (talk) 12:29, 5 June 2011 (UTC)[reply]
Discussion moved here with input from WJBscribe here. Feezo (send a signal | watch the sky) 15:55, 5 June 2011 (UTC)[reply]

Support using Google or other search engines as a metric, but not necessarily using any type of search

Search engines have the largest collection of English documents in the world. While they are not close to being a perfect source of information, it is probably the best option available. Aside: I will not be active in the next few days, since there are deadlines to meet IRL. --Bobthefish2 (talk) 03:23, 5 June 2011 (UTC)[reply]

I would guess that Wikipedia:Article titles#Common names + Wikipedia:Search engine test are relevant in this aspect of the poll proposed by Feezo's diff here, especially Wikipedia:Search engine test#Search engine limitations – technical notes. --Tenmei (talk) 20:31, 5 June 2011 (UTC)[reply]

Support using Google or other search engines as a metric, using any type of search

I would guess that Wikipedia:Article titles#Common names + Wikipedia:Search engine test are relevant in this aspect of the poll proposed by Feezo's diff here, especially Wikipedia:Search engine test#Google unique page count issues. --Tenmei (talk) 20:55, 5 June 2011 (UTC)[reply]

Support using Google as a metric, restricted to books, scholar, and/or news (specify)

Do not support using Google as a metric (to determine a widely acceptable name)

My reasons for not supporting the use of Google search to decide which local name as the article's title :-

  • Unreliable hit counts - can be underestimated or overestimated.
  • Limitation of indexing - e.g. deep web, particularly with book and scholar searches due to copy right and personal issues (Google is blocked).
  • Erratic results - might include duplicate sites, clone sites, spam sites, redirections, etc.
  • Difficult task to check contents - almost impossible to check site by site to include neutral sites and eliminate biased sites with large results.

STSC (talk) 01:36, 5 June 2011 (UTC)[reply]

I would guess that this is a non-issue.As stated, this proposed poll topic is unhelpful because settled wiki-consensus rejects any scheme which provides a wikt:blanket exclusion of data from reliable sources, including conventional Internet sources such as Google. --Tenmei (talk) 23:26, 5 June 2011 (UTC)[reply]

Apples and oranges Conflation. The development of arguments/counterarguments in our mediation context has thus far conflated (a) disputing what is the English-language common name for our articles about disputed islands in the East China Sea; and (b) disputing what is a "neutral" name. These are conceptually distinct subjects which must be parsed accordingly.

In other words, "Google (and other search systems) do not aim for a neutral point of view ... [and] Google is specifically not a source of neutral titles" according to Wikipedia:Search engine test#Neutrality. --Tenmei (talk) 20:43, 5 June 2011

Moved discussion to Tenmei's talk page; feel free to continue there. Feezo (send a signal | watch the sky) 21:24, 5 June 2011 (UTC)[reply]
No. This single bold edit by Feezo is insufficiently explained. This one example of a mediator's discretionary judgment is inadequately justified. IMO, this contribution is not off-topic. --Tenmei (talk) 23:26, 5 June 2011 (UTC)[reply]

Support using encyclopedias and almanacs as a metric

I would guess that this is a non-issue As stated, this proposed poll topic is unhelpful because settled wiki-consensus rejects any scheme which provides a wikt:blanket exclusion of data from reliable sources, especially conventional (non-Internet) sources such as these. --Tenmei (talk) 20:55, 5 June 2011 (UTC)[reply]

Support using library searches

The problem with normal search engine results is that they do not necessarily reflect usage in the real world. For example, there is a major incentive for people to try to manipulate them, and a strong counter-incentive for the search engines to thwart this through elaborate algorithms. The result is a complicated and impenetrable black box that is very useful for finding things, but not necessarily useful for any kind of objective measurement. I propose that we explore a new criteria for evaluating usage: library searches. I'm thinking of resources like WorldCat, catalog.loc.gov, and university libraries. We can determine the exact criteria later, but I believe this is a good starting point. Feezo (send a signal | watch the sky) 21:12, 5 June 2011 (UTC)[reply]

As I said above, I didn't object to show other tests as a supplementary to WP:NCGN. However I don't agree to discuss a new criteria here. It should be discussed at relevant guidelines. ―― Phoenix7777 (talk) 22:16, 5 June 2011 (UTC)[reply]

Conflated issues

The development of arguments and counterarguments at Talk:Senkaku Islands and Talk:Senkaku Islands dispute and here in our mediation venue have thus far conflated

(a) disputing what is the English-language common name for our articles about disputed islands in the East China Sea; and
(b) disputing what is a "neutral" name.

These are conceptually distinct subjects which must be parsed accordingly.

For example, "Google (and other search systems) do not aim for a neutral point of view ... [and] Google is specifically not a source of neutral titles" according to Wikipedia:Search engine test#Neutrality. --Tenmei (talk) 23:09, 5 June 2011 (UTC)[reply]