Jump to content

User talk:Dalba: Difference between revisions

Add topic
From Meta, a Wikimedia project coordination wiki
Latest comment: 22 days ago by Epicgenius in topic Ref names
Content deleted Content added
Swood100 (talk | contribs)
→‎Ref names: reply to Dalba
 
(152 intermediate revisions by 13 users not shown)
Line 1: Line 1:
{{archive box auto}}
{{archive box auto}}
== Date issue ==
==Kew POWO citations format==
For Kew Plants of the World Citations can the format be change from this:


<code><nowiki><ref name="Plants of the World Online k345">{{cite web | title=Melocactus estevesii P.J.Braun | website=Plants of the World Online | url=https://powo.science.kew.org/taxon/urn:lsid:ipni.org:names:938363-1 | access-date=2024-04-29}}</ref></nowiki></code>
Hi Dalba
I wrote about this before... Australian articles (and possibly New Zealand and anywhere where the sun rises earlier than a certain point?) seem to pick up the date a day earlier than the article is dated. I have also noticed recently that when I'm editing in the morning here downunder, the access-date comes up as the earlier date too. Examples on Desperate Measures (2013 Australian TV series). Editing this morning (9 Feb here), it was generating access-date as 8 Feb. It has now clocked over to the 9th. Not terribly serious but thought I may as well report it. [[User:Laterthanyouthink|Laterthanyouthink]] ([[User talk:Laterthanyouthink|talk]]) 00:11, 9 February 2023 (UTC)
:Hi, [[User:Laterthanyouthink|Laterthanyouthink]]. Citer uses [[:wikitech:Portal:Toolforge|Toolforge]]'s server time-zone for access dates which is set to [[:en:UTC+00:00|UTC+0]]. It is possible to change this for each user and adjust according to their computer timezone, but computer clocks are sometimes wrong too for various reasons. I don't think a UTC+0 date is wrong here and as far as I know it is not against Wikipedia's manual of style. Also hiding the user's computer timezone could be considered a feature from a privacy point of view. All of that being said I might consider adjusting the access-dates for user's local time if more users request for it. Thanks for letting me know. [[User:Dalba|Dalba]] 13:41, 10 February 2023 (UTC)
::Ah, I see. Thanks for the explanation. I'll leave it up to you to do as you think best, or leave as is. [[User:Laterthanyouthink|Laterthanyouthink]] ([[User talk:Laterthanyouthink|talk]]) 06:15, 11 February 2023 (UTC)


to this:
== JSTOR ==


<code><nowiki><ref name="Plants of the World Online k345">{{BioRef|powo | title=''Melocactus estevesii'' P.J.Braun | id=938363-1 | access-date=2024-04-29}}</ref></nowiki></code>
This tool is so brilliant it can convert this: ''<nowiki>http://www.jstor.org/stable/20557185</nowiki>'' into this:
* {{cite journal | last=Bitel | first=Lisa | title=Sex, Sin, and Celibacy in Early Christian Ireland | journal=Proceedings of the Harvard Celtic Colloquium | publisher=Department of Celtic Languages & Literatures, Harvard University | volume=7 | year=1987 | issn=15450155 | jstor=20557185 | pages=65–95 | url=http://www.jstor.org/stable/20557185 | access-date=2023-03-15}}


One of the users complained on my talk page about the cites -[[User:Cs california|Cs california]] ([[User talk:Cs california|talk]]) 05:34, 4 May 2024 (UTC)
It's annoying that JSTOR doesn't provide the DOI, but so be it.


:That's a great suggestion about italics in the title. Unfortunately, italicizing the scientific name within the title field is currently difficult. Plants of the World Online doesn't provide distinct metadata for the scientific name and author.
Thank you for your efforts. [[Special:Contributions/76.14.122.5|76.14.122.5]] 04:53, 15 March 2023 (UTC)
:The BioRef template offers a cleaner format, but it's not widely adopted across Wikipedias.
:Thanks for the heartwarming feedback! :) [[User:Dalba|Dalba]] 06:49, 16 March 2023 (UTC)
:Continuing with the 'cite web' template ensures compatibility with most other wikis.
:And, honestly, the main issue for me right now is that maintaining additional code for alternative citation formats can be challenging. However, I'll certainly keep this feedback in mind for future development if resources allow. [[User:Dalba|Dalba]] 15:02, 5 May 2024 (UTC)
:Can we get the ref name shortened? There's no need for it to be that long. "POWO" would be sufficient, or "POWO k345" if there needs to be the distinguisher, though it is cryptic and therefore no better than ":2". - [[User:UtherSRG|UtherSRG]] ([[User talk:UtherSRG|talk]]) 10:23, 7 May 2024 (UTC)
::Sure, but the algorithm needs to be general. Using website acronym does not work in general since many citations don't have a site name. One should also try to choose unique ref names ... I'm going to change it once again to just a random string. (the last time I changed it was nearly 8 months ago, see [https://meta.wikimedia.org/wiki/User_talk:Dalba/Archive_1#c-Dalba-20230824120200-Badgettrg-20230820033900] for the related discussion.) [[User:Dalba|Dalba]] 16:20, 7 May 2024 (UTC)


== HTTPError ==
== publication-place=[Place of publication not identified] ==


My assumption is that you would rather hear about issues than not. The changes you made to present PDF citations in partial form have been a terrific help. I just need to add the title and the author. However, the following URL
Would it be possible to automatically remove "publication-place=[Place of publication not identified]" from the results? That parameter value pops up fairly often. Here's an example:
:<syntaxhighlight lang="html">https://www.icj-cij.org/public/files/case-related/182/182-20220316-ORD-01-00-EN.pdf</syntaxhighlight>
produces: HTTPError


How popular is Citer? Do you keep track of how many uses per day it is getting? Best regards. [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 19:28, 10 December 2023 (UTC)
{{cite book | last=Wilde | first=Geoff | last2=Braham | first2=Michael | title=Sandgrounders : the complete league history of Southport Football Club. | publisher=Carnegie | publication-place=[Place of publication not identified] | date=1995 | isbn=1-874181-14-4 | oclc=650188009}} [[Special:Contributions/76.14.122.5|76.14.122.5]] 03:21, 24 March 2023 (UTC)
:{{done}} [[User:Dalba|Dalba]] 16:09, 31 March 2023 (UTC)
::Thanks! [[Special:Contributions/76.14.122.5|76.14.122.5]] 21:32, 8 April 2023 (UTC)


:I do, thank you. I wish I had more time to work on parsing pdf files, it might be possible to extract more information about PDF files, I'm just concerned about the performance. Anyway, the problem with this particular URL is that it is behind some CloudFlare restriction mechanism. Not actually sure why, but I cannot download the file from command line either:
== Citer stopped working ==
:<syntaxhighlight language=bash>
$ wget https://www.icj-cij.org/public/files/case-related/182/182-20220316-ORD-01-00-EN.pdf
--2023-12-15 14:58:52-- https://www.icj-cij.org/public/files/case-related/182/182-20220316-ORD-01-00-EN.pdf
Resolving www.icj-cij.org (www.icj-cij.org)... 104.22.41.99, 172.67.26.159, 104.22.40.99, ...
Connecting to www.icj-cij.org (www.icj-cij.org)|104.22.41.99|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-12-15 14:58:52 ERROR 403: Forbidden.
</syntaxhighlight>
:Citer cannot access the URL through HTTP protocol and hence the HTTPError. I guess, the result can be improved by returning a partial cite web template instead, but it may take a while before I can get to it.
:Regarding popularity, I really don't know and I regularly clear the limited logs that toolforge provides. But since you asked, I just looked, and for the past 6 hours there has been around 324 requests processed. Not sure how many of them are unique though, the logs are anonymized.
:[[User:Dalba|Dalba]] 15:23, 15 December 2023 (UTC)


== HTTPStatusError ==
Hi Dalba, today I can't your superb CITER to work it gives the message "502 Bad Gateway", some solution? [[User:Mcapdevila|Mcapdevila]] ([[User talk:Mcapdevila|talk]]) 14:40, 3 April 2023 (UTC)
:Hi [[User:Mcapdevila|Mcapdevila]]! Should be fixed now. Thanks for letting me know. [[User:Dalba|Dalba]] 15:12, 3 April 2023 (UTC)
::Yes working fine now, thanks a lot, in my name an others from ca.wiki, for the hundreds of references achieved [[Special:Contributions/93.176.134.117|93.176.134.117]] 21:35, 3 April 2023 (UTC)


Hi again,
=== Same error for ISBNs only ===
I'm not able to use citer now for ISBNs. Are others having the same problem? Is this related to citoid temporarily not dropping ISBN support? &ndash;[[User:Sj|SJ]]<small>&nbsp;[[User Talk:Sj|<font style="color:#f90;">talk</font>]]&nbsp;</small> 17:49, 16 May 2023 (UTC)
:Hi [[User:Sj|SJ]], yes, it was related to citoid issue. I have now incorporated a fix to use Google books as an alternative source for ISBNs. It might not be as comprehensive as Citoid/worldcat, but is better than nothing. Also, it might be subject to rate limits of Google APIs and might start to fail if users send too many consecutive requests... It is working for now. Let me know if you see any issues. [[User:Dalba|Dalba]] 04:48, 17 May 2023 (UTC)


This link:
== Suggestion for improvement ==


:<syntaxhighlight lang="html">https://www.jpeds.com/article/S0022-3476(22)00185-8/fulltext</syntaxhighlight>
Awesome citer! I love it. I have been using it on the Destruction of the Kakhovka Dam article. The only glitch I’ve noticed is that when I cite a source such as this:
https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-june-13-2023
it comes back with a ref that results in a “DNS_PROBE_FINISHED_NXDOMAIN” error. I then just substitute the correct address in "url=" and the correct title: RUSSIAN OFFENSIVE CAMPAIGN ASSESSMENT, JUNE 13, 2023.
Suggestion for improvement: you add the year to the end of the ref name but I’ve found that this results in duplicate ref names when the same person authors more than one article in that year. I’ve begun to replace the year with the date of the article, so that
<nowiki><ref name="Melkozerova 2023"></nowiki> becomes <nowiki><ref name="Melkozerova 060923"></nowiki>
If you could do this when creating the ref that would eliminate this issue. Again, I just love the citer. It saves me so much work! [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 21:33, 14 June 2023 (UTC)


produces the above error, though supplying the DOI listed on that page works fine:
:Thank you!
:The reason for the wrong URL in https://www.understandingwar.org/backgrounder/russian-offensive-campaign-assessment-june-13-2023 was the following open graph meta tag in the source of the webpage:
:<syntaxhighlight lang="html"><meta property="og:url" content="http://dev-isw.bivings.com/" /></syntaxhighlight>
:Citer uses this because sometimes the given URL contains some private information. "og:url" is usually safer with regard to privacy.
:In this case, og:url is obviously wrong. I've updated citer to not use it when it points to a domain name.
:It's hard to come up with a meaningful ref name. Adding full date works in some cases, but for others lacking a date or having the same date it may fail. For now, I changed the name generation mechanism to a semi-random string. This is against the recommendation of [[:en:WP:REFNAME]] which states that "Names should have semantic value". I may revert this change if I get negative feedback from users. [[User:Dalba|Dalba]] 15:46, 16 June 2023 (UTC)


:<syntaxhighlight lang="html">doi.org/10.1016/j.jpeds.2022.03.005</syntaxhighlight>
I just noticed one other item. When using the CITER for this page:


Best regards, [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 15:20, 27 December 2023 (UTC)
https://www.wilsoncenter.org/blog-post/kakhovka-dam-disaster-responsibility-and-consequences


:Unfortunately the website has blocked toolforge's IP address. :( [[User:Dalba|Dalba]] 09:53, 28 December 2023 (UTC)
it came back with a reference containing "date=2023-06-07" whereas the page itself lists a date of "June 14, 2023" and Google in its search results says "6 hours ago". Best regards. [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 22:14, 14 June 2023 (UTC)
:Seems to be {{fixed}} using curl-impersonate. [[User:Dalba|Dalba]] 07:27, 25 February 2024 (UTC)
:This one is difficult to fix. The page does not provide any meta tags describing its date, so citer fallbacks to to finding the first found date in source code which belongs to an image... I'll look deeper into it if there are more reports like this.[[User:Dalba|Dalba]] 15:46, 16 June 2023 (UTC)


== ConnectError ==
I have come across a number of foreign websites working on this Ukraine article. When putting this one through the CITER:


Hi again, when I ran this URL I got the above message:
https://zn.ua/ukr/war/ochilnik-khersonskoji-ova-serednij-riven-pidtoplennja-na-ranok-5-6-metra-evakujovano-majzhe-2-tisjachi-ljudej-.html


:<syntaxhighlight lang="html">https://web.archive.org/web/20161105162350/https:/thejungsoul.com/guidance-for-parents-of-teens-with-rapid-onset-gender-dysphoria/</syntaxhighlight>
it produced a ref that generated some errors: {{cite web}}: Text "Mirror Weekly" ignored (help); Text "Дзеркало тижня" ignored (help) When I just removed those two paramaters from the ref everything was fine. [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 23:03, 14 June 2023 (UTC)
:{{fixed}} The website title contains one or more <code>|</code> characters which is also a parameter delimiter in wiki templates... Citer will now ignore everything after the first pipe character. [[User:Dalba|Dalba]] 15:46, 16 June 2023 (UTC)


However, when I switched at random to this different saved version it worked fine:
::Terrific! Works like a charm! If people complain that "Names should have semantic value", you might just use the ref name that you originally generated and append the semi-random string to that. Thanks again for all your hard work. Much appreciated!! [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 16:03, 16 June 2023 (UTC)
::::I second this comment. It would be good to have the ref names use both the previous author/date combination as well as the random string, e.g. <syntaxhighlight lang="html"><ref name="Kausch 2023 i8iia">{{cite web | last=Kausch | first=Katie | title=El Toro reopening at Six Flags Great Adventure after 10-month closure | website=nj | date=June 16, 2023 | url=https://www.nj.com/ocean/2023/06/el-toro-reopening-at-six-flags-great-adventure-after-10-month-closure.html | access-date=June 17, 2023 | page=}}</syntaxhighlight>. This tool has been very useful to me over the years; thanks for keeping it running all this time. [[User:Epicgenius|Epicgenius]] ([[User talk:Epicgenius|talk]]) 23:29, 17 June 2023 (UTC)
:::::{{done}} [[User:Dalba|Dalba]] 20:26, 18 June 2023 (UTC)
::::::Outstanding!!! Many thanks! [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 20:34, 18 June 2023 (UTC)


:<syntaxhighlight lang="html">https://web.archive.org/web/20171106084816/http://thejungsoul.com/guidance-for-parents-of-teens-with-rapid-onset-gender-dysphoria</syntaxhighlight>
Hi again. I found another oddball website:


I see what the problem is. In the first one the second https: is only followed by a single '/' instead of two. Looks like a screwball error from the page I got this URL from, because I got another URL from that page:
:https://minagro.gov.ua/news/znishchennya-rosiyanami-kahovskoyi-ges-zavdalo-znachnih-zbitkiv-silskomu-gospodarstvu-ukrayini


:<syntaxhighlight lang="html">https://web.archive.org/web/20161209083621/http:/adflegal.org/detailspages/blog-details/allianceedge/2016/08/24/the-weekly-digest-8-24-16</syntaxhighlight>
This produces a <nowiki><ref></nowiki> that includes “language=ua”, which produces:


This one also has a single '/' after the http: but it results in a ref that retains the error in two locations:
:Script warning: One or more {{cite web}} templates have maintenance messages; messages may be hidden (help).


:<syntaxhighlight lang="html"><ref name="Arnold 2016 i520">{{cite web | last=Arnold | first=James | title=The Weekly Digest: 8-24-16 | website=web.archive.org | date=24 August 2016 | url=http:/adflegal.org/detailspages/blog-details/allianceedge/2016/08/24/the-weekly-digest-8-24-16 | archive-url=https://web.archive.org/web/20161209083621/http:/adflegal.org/detailspages/blog-details/allianceedge/2016/08/24/the-weekly-digest-8-24-16 | archive-date=9 December 2016 | url-status=dead | access-date=29 December 2023}}</ref></syntaxhighlight>
The problem is that it should be “language=uk” — “ua” is not one of the allowed codes on this page: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
See also: https://en.wikipedia.org/wiki/Template:Citation_Style_documentation/language/doc


This results in a <nowiki>"{{cite web}}: Check |url= value (help)"</nowiki> red error message, a reference to [https://en.wikipedia.org/wiki/Help:CS1_errors#bad_url this page], and a tooltip when I hover over the link:
“ua” does show up as the right code for Ukraine here: https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes


:<syntaxhighlight lang="html">Arnold, James (24 August 2016). "The Weekly Digest: 8-24-16". web.archive.org. Archived from [http:/adflegal.org/detailspages/blog-details/allianceedge/2016/08/24/the-weekly-digest-8-24-16 the original] on 9 December 2016. Retrieved 29 December 2023. {{cite web}}: Check |url= value (help)</syntaxhighlight>
Can’t explain it. No hurry. This is the only time I’ve come across this. Thanks again! [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 23:29, 20 June 2023 (UTC)
:Looking at the source for that page it has lang="ua" on the first line, so maybe they just used an old-fashioned set of codes. [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 23:38, 20 June 2023 (UTC)
::In my original message I misstated the allowed codes. Apparently it is https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes that is expected but this page uses https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 15:56, 21 June 2023 (UTC)
:Thanks for investigating the issue. {{fixed}} [[User:Dalba|Dalba]] 03:58, 22 June 2023 (UTC)
::Awesome tool! [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 20:40, 23 June 2023 (UTC)


When I add another '/' to the http: in the "url" param in the produced ref the error goes away. I suppose it is asking too much for Citer to correct errors in the URLs it is supplied.
Hello again! Just think of me as your tester, reporting the unexpected.

*This one produces a ref generating: Script warning: One or more {{cite web}} templates have errors; messages may be hidden (help).
[[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 04:28, 29 December 2023 (UTC)
:<syntaxhighlight lang="html">https://babel.ua/en/news/94854-russia-submitted-a-statement-against-ukraine-to-the-international-criminal-court-kyiv-is-accused-of-destroying-the-kakhovka-hpp </syntaxhighlight>

:The resulting ref has some unlabeled parameters:
:Hi there! For me, none of the URLs work. I believe this is another case of toolforge's IP address being blocked by a third party server. Unfortunately, there is not much I can do in these cases. There might be some workarounds, but it will take me a while to implement and test. [[User:Dalba|Dalba]] 04:07, 31 December 2023 (UTC)
:<syntaxhighlight lang="html">| website=Бабель | Розповідаємо про політику, культуру і суспільство в Україні. Останні новини детально і неупереджено | date=2023-06-08 |

</syntaxhighlight>
== HTTPStatusError ==
*Twitter refs always generate this error: {{cite web}}: Missing or empty |title= (help)
:Maybe everything after “twitter.com/” and before the next “/” should become the title, so:
:<syntaxhighlight lang="html">https://twitter.com/general_ben/status/1672243986038226946?cxt=HHwWhIC82fPCgLUuAAAA </syntaxhighlight>
:would generate: title=general_ben
:If that runs into difficulties how about just: title=twitter
:At least then it wouldn't generate an error.
*This one produces a url different from the one it was given:
:<syntaxhighlight lang="html">https://vk.com/wall-201841296_3412?lang=en</syntaxhighlight>
:generated a different url:
:<syntaxhighlight lang="html">{{cite web | title=Wall posts | website=VK | url=https://vk.com/video-201841296_456239220 | language=la | access-date=2023-06-24}}</syntaxhighlight>
:The url generated is the url of the video. Missing is the screen with some text shown by the original url. Also, both urls say lang=’en’ but the ref produced says language=la, which doesn't generate an error but rather a curious "(in Latin)" in the listing.
Of course, it's no problem for me to just manually fix the resulting refs but I'm reporting them as a conscientious and dutiful tester should. Best regards! [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 16:01, 24 June 2023 (UTC)


Hi again,
:Hi! Thanks for reporting the issues!
:* babel.ua: {{fixed}}
:* Twitter: Twitter does not provide any metadata or even a title for the URL in the HTML of the document. Getting the actual title of the page requires processing the JavaScript inside the page which Citer does not support. I could implement some module as you suggested just for twitter, but the result is still not satisfying. I don't think it is worth it.
:* vk.com: url: The website is misusing [https://ogp.me/ open graph meta tags]:
:*:<syntaxhighlight lang=html><meta property="og:url" content="https://vk.com/video-201841296_456239220"/></syntaxhighlight>
:*:I could just ignore og:url on all websites, but I think this will worsen the result for many other websites that are using the tag correctly.
:* vk.com: language: Citer's bug. {{fixed}}!
:[[User:Dalba|Dalba]] 13:17, 6 July 2023 (UTC)
::Thanks! I don't know how I got along without this! [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 00:41, 12 July 2023 (UTC)


This URL:
Great tool. Consider, when identifiers are available, use the identifier for the value of the name attribute in the ref tag. If a PMID is present, that would be the first choice, if a DOI is present that would be the second choice. There may be other identifiers, but I do not know them. The rational is that wiki contributors may use varying citation tools, but if everyone uses a similar naming convention, there is less chance of chaos from a source being cited in multiple ways on a wiki page. [[User:Badgettrg|Badgettrg]] ([[User talk:Badgettrg|talk]]) 03:39, 20 August 2023 (UTC)
:Thank you and thanks for the suggestion! I'm fine with that change, but some users find it less meaningful and against [[:en:WP:REFNAME]] which states "Names should have semantic value, so that they can be more easily distinguished from each other by human editors who are looking at the wikitext. This means that ref names like "Nguyen 2010" are preferred to names like ":31337".". I may change the algorithm in the future if more users request it, but I'm going to keep it as is for now. [[User:Dalba|Dalba]] 12:02, 24 August 2023 (UTC)


<syntaxhighlight lang="html">https://www.reuters.com/world/middle-east/iraq-pays-last-chunk-524-billion-gulf-war-reparations-un-2022-02-09/</syntaxhighlight>
== 502 Bad Gateway ==
*OK, here's an odd one. This is link that takes the user to item #16 on the target page:
:<syntaxhighlight lang="html">https://zakon.rada.gov.ua/laws/show/600-2023-%D0%BF#n17:~:text=16.%20%D0%A0%D0%BE%D0%B7%D0%BC%D1%96%D1%80%20%D0%BA%D0%BE%D0%BC%D0%BF%D0%B5%D0%BD%D1%81%D0%B0%D1%86%D1%96%D1%97%20%D0%B7%D0%B0%20%D0%B7%D0%BD%D0%B8%D1%89%D0%B5%D0%BD%D0%B8%D0%B9%20%D0%BE%D0%B1%E2%80%99%D1%94%D0%BA%D1%82%20%D0%BD%D0%B5%D1%80%D1%83%D1%85%D0%BE%D0%BC%D0%BE%D0%B3%D0%BE%20%D0%BC%D0%B0%D0%B9%D0%BD%D0%B0%20%D1%80%D0%BE%D0%B7%D1%80%D0%B0%D1%85%D0%BE%D0%B2%D1%83%D1%94%D1%82%D1%8C%D1%81%D1%8F%20%D0%9A%D0%BE%D0%BC%D1%96%D1%81%D1%96%D1%94%D1%8E%20%D0%B7%D0%B0%20%D1%84%D0%BE%D1%80%D0%BC%D1%83%D0%BB%D0%BE%D1%8E%3A</syntaxhighlight>
When I enter that into the citer I get:
:<syntaxhighlight lang="html">502 Bad Gateway</syntaxhighlight>
However, when taken to #16 on the target page, the address bar shows:
:<syntaxhighlight lang="html">https://zakon.rada.gov.ua/laws/show/600-2023-%D0%BF#n17</syntaxhighlight>
If that is entered into the citer then it produces a normal <ref> to that page which, however, does not take the user to #16. At a minimum, I would guess that you would want to generate and display an error message containing more information than "502 Bad Gateway." Best regards. [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 20:31, 19 July 2023 (UTC)
:It was indeed an odd one. <!--I was getting this error on toolforge server: "[WARNING] unable to add HTTP_COOKIE=datefmt=bbdy to uwsgi packet, consider increasing buffer size". Migrating from cookies to localStorage fixed the issue. I did not fully understand what was wrong before.--> {{fixed}} [[User:Dalba|Dalba]] 07:46, 20 July 2023 (UTC)
::That string is still generating the "502 Bad Gateway" error on my system. [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 14:46, 20 July 2023 (UTC)
:::Hopefully it is fixed this time. Several fundamental changes have been made under the hood. Let me know if you see any issues. [[User:Dalba|Dalba]] 23:15, 20 July 2023 (UTC)


Results in the above error. Another website blocking toolforge's IP address? Why do they do that? Is it always rate-limiting? Best regards, [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 20:25, 6 January 2024 (UTC)
== Unclosed ref tag ==


:Hi. Yes, reuters.com has blocked the IP address of toolforge. It's completely blocked as far as I can tell, no rate limiting here. I can only guess, but I believe after the recent OpenAI and New York Times confrontation, websites have become more stringent about who can access their contents. Toolforge, being the host of several citation generating tools is sending more than usual requests and therefore websites have started blocking its IP address. [[User:Dalba|Dalba]] 08:17, 12 January 2024 (UTC)
Hello again. I tried to use Citer just now, but it seems like when I use it, the ref tag doesn't seem to actually be closed. For example, for [https://www.curbed.com/2021/05/nycs-riverside-park-is-falling-apart.html this url], I get the following:
:This seems to be {{fixed}} now that citer is using curl-impersonate. [[User:Dalba|Dalba]] 07:25, 25 February 2024 (UTC)


== Allowing citer requests from en.wikipedia.org ==
<nowiki>&lt;ref name="Davidson 2021 j149">{{cite web | last=Davidson | first=Justin | title=Riverside Park Is Falling Apart | website=Curbed | date=May 12, 2021 | url=https://www.curbed.com/2021/05/nycs-riverside-park-is-falling-apart.html | access-date=July 21, 2023}}&lt;ref></nowiki>


Hi Dalba, I'm writing a citation script for myself on en.wikipedia.org and encountered a CORS error when trying to use citer.toolforge.org. Would it be possible to enable CORS by setting the "Access-Control-Allow-Origin" header appropriately on the citer web server? [https://wikitech.wikimedia.org/wiki/News/Toolforge.org#Cross-Origin_Resource_Sharing_(CORS)_requests_broken This page] has more information. Your tool is awesome, by the way. Thanks. [[User:Daniel Quinlan|Daniel Quinlan]] ([[User talk:Daniel Quinlan|talk]]) 08:40, 6 February 2024 (UTC)
I think the last <nowiki>&lt;ref></nowiki> should actually be a <nowiki>&lt;/ref></nowiki>. Thanks again for your hard work. [[User:Epicgenius|Epicgenius]] ([[User talk:Epicgenius|talk]]) 00:06, 21 July 2023 (UTC)


:Hi there! {{done}}. Just note that since I'm not maintaining a stable API yet, the response format might change in the future without any deprecation period. (I have had some thoughts about using [[mw:Citoid|Citoid]] response format, but it's unlikely I'll be able to implement it anytime soon.) [[User:Dalba|Dalba]] 17:27, 6 February 2024 (UTC)
:Hi! Thank you! {{fixed}} [[User:Dalba|Dalba]] 00:46, 21 July 2023 (UTC)
::Thank you so much! One thing that might help scripts a bit would be adding a parameter to get a raw text response (if you have to choose, just the latter format). I haven't really used Citoid because it doesn't seem to extract enough information to make it worthwhile. [[User:Daniel Quinlan|Daniel Quinlan]] ([[User talk:Daniel Quinlan|talk]]) 13:43, 7 February 2024 (UTC)
:::Not sure how you are using it right now, but if you send a POST request instead of a GET request and send the <code>user_input</code> in the body of the request, then citer will return a json response which I guess might be more easily digestible by scripts. Something like <code>await (await fetch('https://citer.toolforge.org/', {'method': 'POST', 'body': 'https://example.com/somepath.html' })).json()</code> should work. [[User:Dalba|Dalba]] 07:39, 8 February 2024 (UTC)
::::I've barely started, but I was doing a GET request and parsing the document. JSON is so much better. For easier updates in the future, you might consider returning a JSON dictionary with named keys like "sfn", "cite", and "ref-name". Also, can the date format be included in the POST request? Thanks again. [[User:Daniel Quinlan|Daniel Quinlan]] ([[User talk:Daniel Quinlan|talk]]) 13:49, 8 February 2024 (UTC)
:::::All parameters of a GET request also work on a POST request if they remain in the URL. The only difference between GET and POST is that `user_input` value should be the body and not in the URL. My previous example with a <code>date_format </code>parameter would become: <code>await (await fetch('https://citer.toolforge.org/?date_format=%Y-%m-%d', {'method': 'POST', 'body': 'https://citer.toolforge.org/' })).json()</code>. You are right about returning a dictionary, it's more flexible and easier to understand. I will probably change it in the future. [[User:Dalba|Dalba]] 14:06, 8 February 2024 (UTC)
::::::Thanks! [[User:Daniel Quinlan|Daniel Quinlan]] ([[User talk:Daniel Quinlan|talk]]) 07:28, 9 February 2024 (UTC)


== Citing via archive links ==
::That was fast, thanks! [[User:Epicgenius|Epicgenius]] ([[User talk:Epicgenius|talk]]) 01:05, 21 July 2023 (UTC)


Hello again Dalba. I've been having some issues trying to use citer with archive.org links. It is frequently returning a 500 code with "ConnectError" in the JSON almost immediately. archive.org can be exceptionally slow retrieving archives, it often takes 15 to 30 seconds and sometimes is probably even more than that. It's also possible citer is just being rate limited by archive.org and my limited testing might be enough to drive it from bad to worse. Any ideas?
== Intentional change? ==


I've also tried using archive.today links like https://archive.today/N3fQ (they also use archive.is and archive.ph, and probably a few more aliases) and that always seems to result in a ReadTimeout error from citer. Would it be possible to support archive.today archive links?
Hi again. I have noticed a few changes recently, one of which is expanding the ref name, but also leaving the url behind in the input field after creating the citation. Was this intentional? I find it a bit of a nuisance having to clear it each time I want to create another one. [[User:Laterthanyouthink|Laterthanyouthink]] ([[User talk:Laterthanyouthink|talk]]) 01:49, 28 July 2023 (UTC)


By the way, I did reach out to archive.org to request that they enable CORS for *.wikipedia.org. If they do that, it's possible that clients could make the request to archive.org and then POST the archive link and the entire web page result to citer for data extraction. That might help if rate limits are the issue. Anyhow, I'll let you know if my request goes anywhere. Regards. [[User:Daniel Quinlan|Daniel Quinlan]] ([[User talk:Daniel Quinlan|talk]]) 07:13, 13 February 2024 (UTC)
:Hi! <!--Right, the main change is that the whole page does not get refreshed on each submit, only the output fields get updated. Some functions may have changed inadvertently in the process... -->I've made a change that brings back the old behavior by clearing the input field after each submit. Let me know if you notice any other/related issues. [[User:Dalba|Dalba]] 08:04, 28 July 2023 (UTC)
::Thanks! :-) [[User:Laterthanyouthink|Laterthanyouthink]] ([[User talk:Laterthanyouthink|talk]]) 09:20, 28 July 2023 (UTC)
== Sohu ==
Hello, the Sohu website(https://www.sohu.com/) consistently lacks https:// every time. Could it be fixed, please? Since the Sohu website is one of the larger platforms in China, it's frequently used when composing articles.--[[User:日期20220626|日期20220626]] ([[User talk:日期20220626|talk]]) 06:22, 19 August 2023 (UTC)
:Sohu is not using [https://ogp.me/ "og:url" meta] tag properly. I've configured citer to ignore its value when it lacks a URL scheme. ({{fixed}}) [[User:Dalba|Dalba]] 11:45, 24 August 2023 (UTC)
==IUCN pages==
I use IUCN pages for referencing the conservation status of plants. If I enter a specific page like [https://www.iucnredlist.org/species/152368/121533277 this] it references to https://www.iucnredlist.org/en , which is the generic front page that is not directly useful for someone cross referencing sources. --[[User:Cs california|Cs california]] ([[User talk:Cs california|talk]]) 07:18, 26 August 2023 (UTC)
:Thanks for reporting the issue. {{fixed}}<!--disabled og:url feature due to many incorrect usage by websites.--> [[User:Dalba|Dalba]] 04:44, 27 August 2023 (UTC)


:The more I look at it, the more archive.today is starting to look like a good addition for dead links. They do comment out scripts including <code>application/ld+json</code>, but that's easy to work around. I'm not sure how aggressive the server is about blocking non-interactive clients, but the maintainer has been willing to whitelist IP addresses in the past. [[User:Daniel Quinlan|Daniel Quinlan]] ([[User talk:Daniel Quinlan|talk]]) 18:48, 13 February 2024 (UTC)
== Highlighting words on the target page ==
:Hi!
:* archive.org: I currently cannot reproduce. It's probably a rate limit. Citer is set to wait for 10 seconds before aborting the request, if you are getting the response immediately then it is not a timeout, perhaps the server has declined the request sooner or some other issue. There might be some clues in the logs, I might need to dig into them. Let me know if they enable CORS for wikipedia, I'll implement a way to submit HTML content to citer.
:* archive.today: I would love to add support, but apparently the server does not reply to toolforge requests, no matter the timeout. Here is the verbose output of a curl call:
:<syntaxhighlight>
:$ time curl -I https://archive.today/N3fQ --connect-timeout 300 -v
:* Trying 51.38.69.52...
:* TCP_NODELAY set
:* Connected to archive.today (51.38.69.52) port 443 (#0)
:* ALPN, offering h2
:* ALPN, offering http/1.1
:* successfully set certificate verify locations:
:* CAfile: none
: CApath: /etc/ssl/certs
:* TLSv1.3 (OUT), TLS handshake, Client hello (1):
:* TLSv1.3 (IN), TLS handshake, Server hello (2):
:* TLSv1.2 (IN), TLS handshake, Certificate (11):
:* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
:* TLSv1.2 (IN), TLS handshake, Server finished (14):
:* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
:* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
:* TLSv1.2 (OUT), TLS handshake, Finished (20):
:* TLSv1.2 (IN), TLS handshake, Finished (20):
:* SSL connection using TLSv1.2 / ECDHE-ECDSA-AES256-GCM-SHA384
:* ALPN, server accepted to use h2
:* Server certificate:
:* subject: CN=archive.today
:* start date: Feb 4 02:20:57 2024 GMT
:* expire date: May 4 02:20:56 2024 GMT
:* subjectAltName: host "archive.today" matched cert's "archive.today"
:* issuer: C=US; O=Let's Encrypt; CN=R3
:* SSL certificate verify ok.
:* Using HTTP2, server supports multi-use
:* Connection state changed (HTTP/2 confirmed)
:* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
:* Using Stream ID: 1 (easy handle 0x5645340fd110)
:> HEAD /N3fQ HTTP/2
:> Host: archive.today
:> User-Agent: curl/7.64.0
:> Accept: */*
:>
:* TLSv1.2 (IN), TLS alert, close notify (256):
:* Empty reply from server
:* Connection #0 to host archive.today left intact
:curl: (52) Empty reply from server
:real 1m0.404s
:user 0m0.030s
:sys 0m0.009s
:</syntaxhighlight>
:Copying `User-Agent` and other headers from browser did not help either. I suspect they have blacklisted toolforge. [[User:Dalba|Dalba]] 06:26, 22 February 2024 (UTC)
::I suspect archive.today has done something to block non-interactive requests. It might be necessary to use something like Selenium. As an alternative, would it be possible for Citer to support submitting the web page content in a POST request along with the original link and the archive link (if the content is from an archive server)? That would help with sites blocking tools like curl and it might help with rate limits and timeouts too.
::Also, archive.today responded positively to two of my requests: CORS requests now work ''and'' they also added back some <code><meta></code> tags as <code><old-meta></code>. The <code>application/ld+json</code> data is available as well (it's commented out, but easy to extract). [[User:Daniel Quinlan|Daniel Quinlan]] ([[User talk:Daniel Quinlan|talk]]) 07:00, 22 February 2024 (UTC)
:::They are using SSL handshake fingerprinting to detect non-browser requests. I was able to access the website using https://github.com/lwthiker/curl-impersonate . I might be able to embed that into citer, it just might take me some time.
:::The POST request idea is also possible and I do plan to implement it. [[User:Dalba|Dalba]] 13:11, 22 February 2024 (UTC)
::::OK, archive.today URLs are now expected to work (not tested thoroughly though).
::::Also, you can now submit HTML using post request. In order to implement this I had to change the POST submit format. Now all parameters should be submitted within the body of the requests in json format. To submit HTML forms, "input_type" should be set to "html" and "user_input" should be an object containing two keys: <code>{"html": "<HTML string of the page>", "url": "<URL>"}</code>. [[User:Dalba|Dalba]] 17:00, 23 February 2024 (UTC)


== RequestsError ==
Hi again. If I enter this address to the citer:
:<syntaxhighlight lang="html">https://books.google.com/books?id=GVIEAAAAMBAJ&dq=oppenheimer+%22if+the+radiance+of+a+thousand+suns+were+to+%22&pg=PA133</syntaxhighlight>
it results in a citation without the highlighting instructions:
:<syntaxhighlight lang="html">https://books.google.com/books?id=GVIEAAAAMBAJ&pg=PA133</syntaxhighlight>
Of course, I can add it back in, but would there be any way of having the citer retain the highlighting? [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 17:49, 29 August 2023 (UTC)


Hi again,
:Hi! Actually, retaining the search highlights was the way Citer originally used to work, but then some users asked for the highlights to be removed automatically, arguing that the highlights may be irrelevant/distracting. I can't remember if Google provided the "Clear search" link at the time, but I can see it is there now, so it should be easy for users to remove the highlights themselves. I'm OK with this change, just not sure how other editors will receive it. [[User:Dalba|Dalba]] 11:07, 1 September 2023 (UTC)


I copied a DOI address from a web page. It was split over two lines which resulted in a space being placed in the middle:
== An unknown error occurred ==


<syntaxhighlight lang="html">https://doi.org/10.1371/%20journal.pgph.0000245</syntaxhighlight>
Hi again. The following address:


This resulted in Citer returning the message: "RequestsError". When I removed the '%20' from the string I got the right result. If it is true that a space is never appropriate in the middle of a DOI string, then stripping any such spaces before running the query might result in more satisfied and less confused users (or in the alternative, substitute the message, "You did not enter a valid DOI. Please check your source."). [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 20:45, 15 March 2024 (UTC)
:<syntaxhighlight lang="html">https://www.isu.org/inside-isu/rules-regulations/isu-statutes-constitution-regulations-technical/29326-constitution-general-regulations-2022/file</syntaxhighlight>


:Hi. Thanks for the suggestion. I had to refer to [https://www.doi.org/doi-handbook/DOI_Handbook_Final.pdf DOI handbook] to see if space is a valid character or not. According to section 3.2.1 GENERAL CHARACTERISTICS OF THE DOI SYNTAX: "The DOI name is case-insensitive and can incorporate any [[:en:printable characters|printable characters]] from the legal [[:en:graphic characters|graphic characters]] of Unicode." Apparently, space is considered both a graphic character and printable character. That being said, I have not seen any DOI containing the space character.
results in the message: "An unknown error occurred." [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 19:03, 10 September 2023 (UTC)
:Currently citer does not consider the space a valid DOI character, but https://doi.org/10.1371/%20journal.pgph.0000245 is still a valid URL and citer tries to connect to its server, but it fails with <code>RequestsError</code> because the server responds with 404 error code.
:It is possible to add a separate input type for DOIs. That way citer would not confuse a DOI for a URL. However I believe a separate input type would be a little less convenient for users. For now I'm going to leave citer as it is but might reconsider if other users report similar issues. [[User:Dalba|Dalba]] 08:33, 22 March 2024 (UTC)


== Twin ISSN generated by Citer in cite journal ==
Related to the above is this address:


In quite a few cases, Citer generates a twin ISSN in the form issn=<ISSN1>, <ISSN2> in the {{tl|Cite journal}}. The magazines now routinely declare twin ISSNs, one for Internet, one for print. Is it possible to channel the second ISSN into eissn= ? Thank you in advance! [[User:Викидим|Викидим]] ([[User talk:Викидим|talk]]) 19:29, 23 April 2024 (UTC)
:<syntaxhighlight lang="html">https://www.isu.org/inside-isu/rules-regulations/isu-statutes-constitution-regulations-technical</syntaxhighlight>


:Could you provide an example input that has this issue? [[User:Dalba|Dalba]] 05:14, 26 April 2024 (UTC)
In the result, the website= parameter is "-", as is the first part of the ref name. Also, two words in the title are run together. Continuing to love the citer! [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 19:49, 10 September 2023 (UTC)
::For example, <nowiki>https://www.jstor.org/stable/1687467</nowiki> produces "issn=00368075, 10959203" that does not work with cite templates. The first ISSN is print, the second - online. [[User:Викидим|Викидим]] ([[User talk:Викидим|talk]]) 18:19, 27 April 2024 (UTC)
:::{{fixed}} AFAICT, JSTOR does not provide any info about which ISSN is the electronic one. I decided to ignore the second one and use the first as {{para|issn}}. [[User:Dalba|Dalba]] 18:03, 2 May 2024 (UTC)


== DOI 10.1109/5992.805138 ==
In case it helps you find the cause of the problem, I just had the same issue with this address:


With input 10.1109/5992.805138 , the result is unexpected: the submit button stays grayed out, I( have to close the window to continue. There is no result either. While at it, this is a truly great tool! Thank you! [[User:Викидим|Викидим]] ([[User talk:Викидим|talk]]) 18:25, 27 April 2024 (UTC)
:<syntaxhighlight lang="html">https://stillmed.olympics.com/media/Documents/News/2023/03/Participation-for-Individual-Neutral-Athletes-Personnel-with-a-Russian-or-Belarusian-Passport.pdf</syntaxhighlight>


Best regards [[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 00:34, 13 September 2023 (UTC)
:Thank you! Should be fixed now. [[User:Dalba|Dalba]] 17:58, 2 May 2024 (UTC)


== Ref names ==
And also with this address:


Hi Dalba, thanks again for this amazing tool. I had a question about the ref names that are generated by the tool. I noticed that, until a week ago, the tool would include the author's last name and the publication date in the reference name, e.g.:
:<syntaxhighlight lang="html">https://s3.documentcloud.org/documents/23387965/bills-117hr7776eas-rcp117-70.pdf</syntaxhighlight>


<syntaxhighlight lang="wikitext"><ref name="Valenti 2024 n238">{{cite web | last=Valenti | first=John | title=60 years ago, the World's Fair showcased dazzling inventions and international cultures | website=Newsday | date=April 20, 2024 | url=https://www.newsday.com/news/new-york/worlds-fair-60th-anniversary-v7xgi3gr | access-date=May 12, 2024}}</ref></syntaxhighlight>
[[User:Swood100|Swood100]] ([[User talk:Swood100|talk]]) 19:28, 13 September 2023 (UTC)


Recently, however, it appears the last name and publication date are not included in the reference name at all, so the references come out like this:
== Help" Citer has stopped working ==
<syntaxhighlight lang="wikitext"><ref name="n238">{{cite web | last=Valenti | first=John | title=60 years ago, the World's Fair showcased dazzling inventions and international cultures | website=Newsday | date=April 20, 2024 | url=https://www.newsday.com/news/new-york/worlds-fair-60th-anniversary-v7xgi3gr | access-date=May 12, 2024}}</ref></syntaxhighlight>


Is this an intentional change? I am not sure about other projects, but on English Wikipedia, [[:en:Help:Footnotes#WP:NAMEDREFS|Help:Footnotes]] says that the reference names "should have semantic value, so that they can be more easily distinguished from each other by human editors who are looking at the wikitext". I am concerned that the current reference names might not be doing that. [[User:Epicgenius|Epicgenius]] ([[User talk:Epicgenius|talk]]) 18:00, 12 May 2024 (UTC)
Dear Dalba I am using Citer many times per day, it is possible to fix it? maybe it's just the server.. anyway thanks for your work..Gives de msg. "504 Gateway Time-out"
[[User:Mcapdevila|Mcapdevila]] ([[User talk:Mcapdevila|talk]]) 09:52, 12 September 2023 (UTC)


:Hi! You're right, I did change it (again!) after another user complained that the generated names can sometimes be too long.[https://meta.wikimedia.org/wiki/User_talk:Dalba#c-Dalba-20240507162000-UtherSRG-20240507102300] I'm aware of the guideline, but in practice, how much do you rely on the semantic meaning of the reference name? Personally, I don't find the reference name that important; using the browser's "find in page" function or a page preview works fine for me. That being said, I'm happy to revert the change (again!!) if you think the older method was better. I'm undecided on this one. [[User:Dalba|Dalba]] 15:27, 14 May 2024 (UTC)
:I have also encountered this situation. [[User:日期20220626|日期20220626]] ([[User talk:日期20220626|talk]]) 10:40, 12 September 2023 (UTC)
:: Thanks for the response. I mainly rely on the author's last name (or the name of the publication, if there's no author). I could see why someone may think "Plants of the World Online" is too long, but for that particular case, spelling out the whole name may also be useful to people who wouldn't know what "POWO" stands for.{{pb}}I personally am not too bothered if you leave the names as is, since I primarily use Citer in conjunction with VisualEditor, which allows editors to reuse references without actually knowing the ref name. However, for those who use the wikitext editor, the reference names might be more helpful to them. [[User:Epicgenius|Epicgenius]] ([[User talk:Epicgenius|talk]]) 23:49, 14 May 2024 (UTC)
:I just restarted the web-service, it should be back up. Will investigate later. Thanks for reporting the issue. [[User:Dalba|Dalba]] 10:53, 12 September 2023 (UTC)
::Hi, it's working again, thanks.. [[User:La-Rierada|La-Rierada]] ([[User talk:La-Rierada|talk]]) 13:58, 12 September 2023 (UTC)
::Ça marche de nouveau!, merci!! [[User:Mcapdevila|Mcapdevila]] ([[User talk:Mcapdevila|talk]]) 14:00, 12 September 2023 (UTC)

Latest revision as of 23:49, 14 May 2024

Kew POWO citations format[edit]

For Kew Plants of the World Citations can the format be change from this:

<ref name="Plants of the World Online k345">{{cite web | title=Melocactus estevesii P.J.Braun | website=Plants of the World Online | url=https://powo.science.kew.org/taxon/urn:lsid:ipni.org:names:938363-1 | access-date=2024-04-29}}</ref>

to this:

<ref name="Plants of the World Online k345">{{BioRef|powo | title=''Melocactus estevesii'' P.J.Braun | id=938363-1 | access-date=2024-04-29}}</ref>

One of the users complained on my talk page about the cites -Cs california (talk) 05:34, 4 May 2024 (UTC)Reply

That's a great suggestion about italics in the title. Unfortunately, italicizing the scientific name within the title field is currently difficult. Plants of the World Online doesn't provide distinct metadata for the scientific name and author.
The BioRef template offers a cleaner format, but it's not widely adopted across Wikipedias.
Continuing with the 'cite web' template ensures compatibility with most other wikis.
And, honestly, the main issue for me right now is that maintaining additional code for alternative citation formats can be challenging. However, I'll certainly keep this feedback in mind for future development if resources allow. Dalba 15:02, 5 May 2024 (UTC)Reply
Can we get the ref name shortened? There's no need for it to be that long. "POWO" would be sufficient, or "POWO k345" if there needs to be the distinguisher, though it is cryptic and therefore no better than ":2". - UtherSRG (talk) 10:23, 7 May 2024 (UTC)Reply
Sure, but the algorithm needs to be general. Using website acronym does not work in general since many citations don't have a site name. One should also try to choose unique ref names ... I'm going to change it once again to just a random string. (the last time I changed it was nearly 8 months ago, see [1] for the related discussion.) Dalba 16:20, 7 May 2024 (UTC)Reply

HTTPError[edit]

My assumption is that you would rather hear about issues than not. The changes you made to present PDF citations in partial form have been a terrific help. I just need to add the title and the author. However, the following URL

https://www.icj-cij.org/public/files/case-related/182/182-20220316-ORD-01-00-EN.pdf

produces: HTTPError

How popular is Citer? Do you keep track of how many uses per day it is getting? Best regards. Swood100 (talk) 19:28, 10 December 2023 (UTC)Reply

I do, thank you. I wish I had more time to work on parsing pdf files, it might be possible to extract more information about PDF files, I'm just concerned about the performance. Anyway, the problem with this particular URL is that it is behind some CloudFlare restriction mechanism. Not actually sure why, but I cannot download the file from command line either:
 
$ wget https://www.icj-cij.org/public/files/case-related/182/182-20220316-ORD-01-00-EN.pdf
--2023-12-15 14:58:52--  https://www.icj-cij.org/public/files/case-related/182/182-20220316-ORD-01-00-EN.pdf
Resolving www.icj-cij.org (www.icj-cij.org)... 104.22.41.99, 172.67.26.159, 104.22.40.99, ...
Connecting to www.icj-cij.org (www.icj-cij.org)|104.22.41.99|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-12-15 14:58:52 ERROR 403: Forbidden.
Citer cannot access the URL through HTTP protocol and hence the HTTPError. I guess, the result can be improved by returning a partial cite web template instead, but it may take a while before I can get to it.
Regarding popularity, I really don't know and I regularly clear the limited logs that toolforge provides. But since you asked, I just looked, and for the past 6 hours there has been around 324 requests processed. Not sure how many of them are unique though, the logs are anonymized.
Dalba 15:23, 15 December 2023 (UTC)Reply

HTTPStatusError[edit]

Hi again,

This link:

https://www.jpeds.com/article/S0022-3476(22)00185-8/fulltext

produces the above error, though supplying the DOI listed on that page works fine:

doi.org/10.1016/j.jpeds.2022.03.005

Best regards, Swood100 (talk) 15:20, 27 December 2023 (UTC)Reply

Unfortunately the website has blocked toolforge's IP address. :( Dalba 09:53, 28 December 2023 (UTC)Reply
Seems to be Fixed using curl-impersonate. Dalba 07:27, 25 February 2024 (UTC)Reply

ConnectError[edit]

Hi again, when I ran this URL I got the above message:

https://web.archive.org/web/20161105162350/https:/thejungsoul.com/guidance-for-parents-of-teens-with-rapid-onset-gender-dysphoria/

However, when I switched at random to this different saved version it worked fine:

https://web.archive.org/web/20171106084816/http://thejungsoul.com/guidance-for-parents-of-teens-with-rapid-onset-gender-dysphoria

I see what the problem is. In the first one the second https: is only followed by a single '/' instead of two. Looks like a screwball error from the page I got this URL from, because I got another URL from that page:

https://web.archive.org/web/20161209083621/http:/adflegal.org/detailspages/blog-details/allianceedge/2016/08/24/the-weekly-digest-8-24-16

This one also has a single '/' after the http: but it results in a ref that retains the error in two locations:

<ref name="Arnold 2016 i520">{{cite web | last=Arnold | first=James | title=The Weekly Digest: 8-24-16 | website=web.archive.org | date=24 August 2016 | url=http:/adflegal.org/detailspages/blog-details/allianceedge/2016/08/24/the-weekly-digest-8-24-16 | archive-url=https://web.archive.org/web/20161209083621/http:/adflegal.org/detailspages/blog-details/allianceedge/2016/08/24/the-weekly-digest-8-24-16 | archive-date=9 December 2016 | url-status=dead | access-date=29 December 2023}}</ref>

This results in a "{{cite web}}: Check |url= value (help)" red error message, a reference to this page, and a tooltip when I hover over the link:

Arnold, James (24 August 2016). "The Weekly Digest: 8-24-16". web.archive.org. Archived from [http:/adflegal.org/detailspages/blog-details/allianceedge/2016/08/24/the-weekly-digest-8-24-16 the original] on 9 December 2016. Retrieved 29 December 2023. {{cite web}}: Check |url= value (help)

When I add another '/' to the http: in the "url" param in the produced ref the error goes away. I suppose it is asking too much for Citer to correct errors in the URLs it is supplied.

Swood100 (talk) 04:28, 29 December 2023 (UTC)Reply

Hi there! For me, none of the URLs work. I believe this is another case of toolforge's IP address being blocked by a third party server. Unfortunately, there is not much I can do in these cases. There might be some workarounds, but it will take me a while to implement and test. Dalba 04:07, 31 December 2023 (UTC)Reply

HTTPStatusError[edit]

Hi again,

This URL:

https://www.reuters.com/world/middle-east/iraq-pays-last-chunk-524-billion-gulf-war-reparations-un-2022-02-09/

Results in the above error. Another website blocking toolforge's IP address? Why do they do that? Is it always rate-limiting? Best regards, Swood100 (talk) 20:25, 6 January 2024 (UTC)Reply

Hi. Yes, reuters.com has blocked the IP address of toolforge. It's completely blocked as far as I can tell, no rate limiting here. I can only guess, but I believe after the recent OpenAI and New York Times confrontation, websites have become more stringent about who can access their contents. Toolforge, being the host of several citation generating tools is sending more than usual requests and therefore websites have started blocking its IP address. Dalba 08:17, 12 January 2024 (UTC)Reply
This seems to be Fixed now that citer is using curl-impersonate. Dalba 07:25, 25 February 2024 (UTC)Reply

Allowing citer requests from en.wikipedia.org[edit]

Hi Dalba, I'm writing a citation script for myself on en.wikipedia.org and encountered a CORS error when trying to use citer.toolforge.org. Would it be possible to enable CORS by setting the "Access-Control-Allow-Origin" header appropriately on the citer web server? This page has more information. Your tool is awesome, by the way. Thanks. Daniel Quinlan (talk) 08:40, 6 February 2024 (UTC)Reply

Hi there! Done. Just note that since I'm not maintaining a stable API yet, the response format might change in the future without any deprecation period. (I have had some thoughts about using Citoid response format, but it's unlikely I'll be able to implement it anytime soon.) Dalba 17:27, 6 February 2024 (UTC)Reply
Thank you so much! One thing that might help scripts a bit would be adding a parameter to get a raw text response (if you have to choose, just the latter format). I haven't really used Citoid because it doesn't seem to extract enough information to make it worthwhile. Daniel Quinlan (talk) 13:43, 7 February 2024 (UTC)Reply
Not sure how you are using it right now, but if you send a POST request instead of a GET request and send the user_input in the body of the request, then citer will return a json response which I guess might be more easily digestible by scripts. Something like await (await fetch('https://citer.toolforge.org/', {'method': 'POST', 'body': 'https://example.com/somepath.html' })).json() should work. Dalba 07:39, 8 February 2024 (UTC)Reply
I've barely started, but I was doing a GET request and parsing the document. JSON is so much better. For easier updates in the future, you might consider returning a JSON dictionary with named keys like "sfn", "cite", and "ref-name". Also, can the date format be included in the POST request? Thanks again. Daniel Quinlan (talk) 13:49, 8 February 2024 (UTC)Reply
All parameters of a GET request also work on a POST request if they remain in the URL. The only difference between GET and POST is that `user_input` value should be the body and not in the URL. My previous example with a date_format parameter would become: await (await fetch('https://citer.toolforge.org/?date_format=%Y-%m-%d', {'method': 'POST', 'body': 'https://citer.toolforge.org/' })).json(). You are right about returning a dictionary, it's more flexible and easier to understand. I will probably change it in the future. Dalba 14:06, 8 February 2024 (UTC)Reply
Thanks! Daniel Quinlan (talk) 07:28, 9 February 2024 (UTC)Reply

Citing via archive links[edit]

Hello again Dalba. I've been having some issues trying to use citer with archive.org links. It is frequently returning a 500 code with "ConnectError" in the JSON almost immediately. archive.org can be exceptionally slow retrieving archives, it often takes 15 to 30 seconds and sometimes is probably even more than that. It's also possible citer is just being rate limited by archive.org and my limited testing might be enough to drive it from bad to worse. Any ideas?

I've also tried using archive.today links like https://archive.today/N3fQ (they also use archive.is and archive.ph, and probably a few more aliases) and that always seems to result in a ReadTimeout error from citer. Would it be possible to support archive.today archive links?

By the way, I did reach out to archive.org to request that they enable CORS for *.wikipedia.org. If they do that, it's possible that clients could make the request to archive.org and then POST the archive link and the entire web page result to citer for data extraction. That might help if rate limits are the issue. Anyhow, I'll let you know if my request goes anywhere. Regards. Daniel Quinlan (talk) 07:13, 13 February 2024 (UTC)Reply

The more I look at it, the more archive.today is starting to look like a good addition for dead links. They do comment out scripts including application/ld+json, but that's easy to work around. I'm not sure how aggressive the server is about blocking non-interactive clients, but the maintainer has been willing to whitelist IP addresses in the past. Daniel Quinlan (talk) 18:48, 13 February 2024 (UTC)Reply
Hi!
  • archive.org: I currently cannot reproduce. It's probably a rate limit. Citer is set to wait for 10 seconds before aborting the request, if you are getting the response immediately then it is not a timeout, perhaps the server has declined the request sooner or some other issue. There might be some clues in the logs, I might need to dig into them. Let me know if they enable CORS for wikipedia, I'll implement a way to submit HTML content to citer.
  • archive.today: I would love to add support, but apparently the server does not reply to toolforge requests, no matter the timeout. Here is the verbose output of a curl call:
:$ time curl -I https://archive.today/N3fQ --connect-timeout 300 -v
:*   Trying 51.38.69.52...
:* TCP_NODELAY set
:* Connected to archive.today (51.38.69.52) port 443 (#0)
:* ALPN, offering h2
:* ALPN, offering http/1.1
:* successfully set certificate verify locations:
:*   CAfile: none
:  CApath: /etc/ssl/certs
:* TLSv1.3 (OUT), TLS handshake, Client hello (1):
:* TLSv1.3 (IN), TLS handshake, Server hello (2):
:* TLSv1.2 (IN), TLS handshake, Certificate (11):
:* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
:* TLSv1.2 (IN), TLS handshake, Server finished (14):
:* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
:* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
:* TLSv1.2 (OUT), TLS handshake, Finished (20):
:* TLSv1.2 (IN), TLS handshake, Finished (20):
:* SSL connection using TLSv1.2 / ECDHE-ECDSA-AES256-GCM-SHA384
:* ALPN, server accepted to use h2
:* Server certificate:
:*  subject: CN=archive.today
:*  start date: Feb  4 02:20:57 2024 GMT
:*  expire date: May  4 02:20:56 2024 GMT
:*  subjectAltName: host "archive.today" matched cert's "archive.today"
:*  issuer: C=US; O=Let's Encrypt; CN=R3
:*  SSL certificate verify ok.
:* Using HTTP2, server supports multi-use
:* Connection state changed (HTTP/2 confirmed)
:* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
:* Using Stream ID: 1 (easy handle 0x5645340fd110)
:> HEAD /N3fQ HTTP/2
:> Host: archive.today
:> User-Agent: curl/7.64.0
:> Accept: */*
:>
:* TLSv1.2 (IN), TLS alert, close notify (256):
:* Empty reply from server
:* Connection #0 to host archive.today left intact
:curl: (52) Empty reply from server
:real    1m0.404s
:user    0m0.030s
:sys     0m0.009s
:
Copying `User-Agent` and other headers from browser did not help either. I suspect they have blacklisted toolforge. Dalba 06:26, 22 February 2024 (UTC)Reply
I suspect archive.today has done something to block non-interactive requests. It might be necessary to use something like Selenium. As an alternative, would it be possible for Citer to support submitting the web page content in a POST request along with the original link and the archive link (if the content is from an archive server)? That would help with sites blocking tools like curl and it might help with rate limits and timeouts too.
Also, archive.today responded positively to two of my requests: CORS requests now work and they also added back some <meta> tags as <old-meta>. The application/ld+json data is available as well (it's commented out, but easy to extract). Daniel Quinlan (talk) 07:00, 22 February 2024 (UTC)Reply
They are using SSL handshake fingerprinting to detect non-browser requests. I was able to access the website using https://github.com/lwthiker/curl-impersonate . I might be able to embed that into citer, it just might take me some time.
The POST request idea is also possible and I do plan to implement it. Dalba 13:11, 22 February 2024 (UTC)Reply
OK, archive.today URLs are now expected to work (not tested thoroughly though).
Also, you can now submit HTML using post request. In order to implement this I had to change the POST submit format. Now all parameters should be submitted within the body of the requests in json format. To submit HTML forms, "input_type" should be set to "html" and "user_input" should be an object containing two keys: {"html": "<HTML string of the page>", "url": "<URL>"}. Dalba 17:00, 23 February 2024 (UTC)Reply

RequestsError[edit]

Hi again,

I copied a DOI address from a web page. It was split over two lines which resulted in a space being placed in the middle:

https://doi.org/10.1371/%20journal.pgph.0000245

This resulted in Citer returning the message: "RequestsError". When I removed the '%20' from the string I got the right result. If it is true that a space is never appropriate in the middle of a DOI string, then stripping any such spaces before running the query might result in more satisfied and less confused users (or in the alternative, substitute the message, "You did not enter a valid DOI. Please check your source."). Swood100 (talk) 20:45, 15 March 2024 (UTC)Reply

Hi. Thanks for the suggestion. I had to refer to DOI handbook to see if space is a valid character or not. According to section 3.2.1 GENERAL CHARACTERISTICS OF THE DOI SYNTAX: "The DOI name is case-insensitive and can incorporate any printable characters from the legal graphic characters of Unicode." Apparently, space is considered both a graphic character and printable character. That being said, I have not seen any DOI containing the space character.
Currently citer does not consider the space a valid DOI character, but https://doi.org/10.1371/%20journal.pgph.0000245 is still a valid URL and citer tries to connect to its server, but it fails with RequestsError because the server responds with 404 error code.
It is possible to add a separate input type for DOIs. That way citer would not confuse a DOI for a URL. However I believe a separate input type would be a little less convenient for users. For now I'm going to leave citer as it is but might reconsider if other users report similar issues. Dalba 08:33, 22 March 2024 (UTC)Reply

Twin ISSN generated by Citer in cite journal[edit]

In quite a few cases, Citer generates a twin ISSN in the form issn=<ISSN1>, <ISSN2> in the {{Cite journal}}. The magazines now routinely declare twin ISSNs, one for Internet, one for print. Is it possible to channel the second ISSN into eissn= ? Thank you in advance! Викидим (talk) 19:29, 23 April 2024 (UTC)Reply

Could you provide an example input that has this issue? Dalba 05:14, 26 April 2024 (UTC)Reply
For example, https://www.jstor.org/stable/1687467 produces "issn=00368075, 10959203" that does not work with cite templates. The first ISSN is print, the second - online. Викидим (talk) 18:19, 27 April 2024 (UTC)Reply
Fixed AFAICT, JSTOR does not provide any info about which ISSN is the electronic one. I decided to ignore the second one and use the first as |issn=. Dalba 18:03, 2 May 2024 (UTC)Reply

DOI 10.1109/5992.805138[edit]

With input 10.1109/5992.805138 , the result is unexpected: the submit button stays grayed out, I( have to close the window to continue. There is no result either. While at it, this is a truly great tool! Thank you! Викидим (talk) 18:25, 27 April 2024 (UTC)Reply

Thank you! Should be fixed now. Dalba 17:58, 2 May 2024 (UTC)Reply

Ref names[edit]

Hi Dalba, thanks again for this amazing tool. I had a question about the ref names that are generated by the tool. I noticed that, until a week ago, the tool would include the author's last name and the publication date in the reference name, e.g.:

<ref name="Valenti 2024 n238">{{cite web | last=Valenti | first=John | title=60 years ago, the World's Fair showcased dazzling inventions and international cultures | website=Newsday | date=April 20, 2024 | url=https://www.newsday.com/news/new-york/worlds-fair-60th-anniversary-v7xgi3gr | access-date=May 12, 2024}}</ref>

Recently, however, it appears the last name and publication date are not included in the reference name at all, so the references come out like this:

<ref name="n238">{{cite web | last=Valenti | first=John | title=60 years ago, the World's Fair showcased dazzling inventions and international cultures | website=Newsday | date=April 20, 2024 | url=https://www.newsday.com/news/new-york/worlds-fair-60th-anniversary-v7xgi3gr | access-date=May 12, 2024}}</ref>

Is this an intentional change? I am not sure about other projects, but on English Wikipedia, Help:Footnotes says that the reference names "should have semantic value, so that they can be more easily distinguished from each other by human editors who are looking at the wikitext". I am concerned that the current reference names might not be doing that. Epicgenius (talk) 18:00, 12 May 2024 (UTC)Reply

Hi! You're right, I did change it (again!) after another user complained that the generated names can sometimes be too long.[2] I'm aware of the guideline, but in practice, how much do you rely on the semantic meaning of the reference name? Personally, I don't find the reference name that important; using the browser's "find in page" function or a page preview works fine for me. That being said, I'm happy to revert the change (again!!) if you think the older method was better. I'm undecided on this one. Dalba 15:27, 14 May 2024 (UTC)Reply
Thanks for the response. I mainly rely on the author's last name (or the name of the publication, if there's no author). I could see why someone may think "Plants of the World Online" is too long, but for that particular case, spelling out the whole name may also be useful to people who wouldn't know what "POWO" stands for.
I personally am not too bothered if you leave the names as is, since I primarily use Citer in conjunction with VisualEditor, which allows editors to reuse references without actually knowing the ref name. However, for those who use the wikitext editor, the reference names might be more helpful to them. Epicgenius (talk) 23:49, 14 May 2024 (UTC)Reply