Qiuwen Baike bot / 求闻百科机器人¶
Qiuwen Baike bot. / 求闻百科机器人。
Features / 功能¶
Comprehensive Cleanup / 综合治理
Cleaning up illegal references / 清理非法参考文献
Cleaning up unsupported HTML tags / 清理不支持的HTML标签
Cleaning up expired templates / 清理过期模板
Removing illegal flags of the Taiwan authorities / 清理台湾当局非法旗帜
Removing illegal era names of the Taiwan authorities / 清理台湾当局非法年号
Correcting terms related to Taiwan / 修正涉台用语
Correcting terms related to Hong Kong / 修正涉港用语
Correcting terms related to politics / 修正涉政用语
Correcting terms related to “Cultural Revolution” / 修正“文革”用语
Marking redundant entries caused by differences between Simplified and Traditional Chinese / 标记简繁差异造成的重复条目
Disclaimer / 免责声明¶
Qiuwen Baike® is a trademark or registered trademark of the operator of Qiuwen Baike website or its affiliated entities. This project uses the trademark legitimately based on Article 59 of the “Trademark Law of the People’s Republic of China”. When using it in the modified version of this project, one should observe the “Trademark Law”. / 求闻百科®是求闻百科网站运营者或其关联实体的商标或注册商标,本项目基于《中华人民共和国商标法》第五十九条对该商标正当使用。在本项目的修改版本中使用时,应注意遵守《商标法》。
Table of Contents¶
Installation¶
To install qiuwenbot, simply use pip:
pip install qiuwenbot
Usage¶
Firstly, one needs to prepare a JSON file that describes the job.
1{
2 "user": "Njzjzbot",
3 "task": "filter",
4 "pages": {
5 "type": "all"
6 }
7}
The detailed parameters can be found in Input Parameters.
The password can be set by QIUWENBOT_PASSWORD
environment variable. It is
recommended using the bot password.
Then, one can submit the job using the following command:
qiuwenbot submit filter_all.json
See Command line interface for more information.
Command line interface¶
usage: qiuwenbot [-h] {submit,gui} ...
Sub-commands¶
submit¶
Submit a task
qiuwenbot submit [-h] CONFIG
Positional Arguments¶
- CONFIG
Path to the config file
gui¶
Serve DP-GUI.
qiuwenbot gui [-h] [-p PORT] [--bind_all]
Named Arguments¶
- -p, --port
The port to serve DP-GUI on.
Default: 6042
- --bind_all
Serve on all public interfaces. This will expose your DP-GUI instance to the network on both IPv4 and IPv6 (where available).
Default: False
Input Parameters¶
Note
One can load, modify, and export the input file by using our effective web-based tool DP-GUI hosted using the command line interface qiuwenbot gui
. All parameters below can be set in DP-GUI. By clicking “SAVE JSON”, one can download the input file.
- user:¶
- type:
str
argument path:user
Username.
- pages:¶
- type:
dict
argument path:pages
Configurations of scanned pages.
Depending on the value of type, different sub args are accepted.
- type:¶
When type is set to
all
:Scan all pages in alphabetical order.
- namespace:¶
- type:
int
, optional, default:0
argument path:pages[all]/namespace
Namespace(s) of the pages.
- restart:¶
- type:
bool
, optional, default:False
argument path:pages[all]/restart
Restart from the last page in the log.
When type is set to
new
:Scan new pages.
- namespace:¶
- type:
list
|int
, optional, default:0
argument path:pages[new]/namespace
Namespace(s) of the pages.
- start:¶
- type:
str
|NoneType
, optional, default:None
argument path:pages[new]/start
Start time in ISO format.
- end:¶
- type:
str
|NoneType
, optional, default:None
argument path:pages[new]/end
End time in ISO format.
When type is set to
link
(or its aliastemplate
):Scan pages that link to a page or include a template.
- name:¶
- type:
str
argument path:pages[link]/name
Name of the page or template.
- namespace:¶
- type:
list
|int
|NoneType
, optional, default:None
argument path:pages[link]/namespace
Namespace(s) of the pages.
When type is set to
page
:Scan a single page.
- name:¶
- type:
str
argument path:pages[page]/name
Name of the page or template.
- task:¶
- type:
str
argument path:task
Task to submit.
API documentation¶
qiuwenbot package¶
Qiuwen bot.
Subpackages¶
qiuwenbot.entrypoints package¶
Scripts.
Submodules¶
qiuwenbot.entrypoints.gui module¶
DP-GUI entrypoint.
qiuwenbot.entrypoints.submit module¶
qiuwenbot.filter package¶
Filter texts.
Submodules¶
qiuwenbot.filter.clean_refs module¶
qiuwenbot.filter.common module¶
qiuwenbot.filter.expired_templates module¶
- class qiuwenbot.filter.expired_templates.RemoveExpiredCurrentFilter[source]¶
Bases:
RemoveExpiredTemplateFilter
Filter to remove {{current}}.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
- class qiuwenbot.filter.expired_templates.RemoveExpiredDeadFilter[source]¶
Bases:
RemoveExpiredTemplateFilter
Filter to remove {{近期逝世}}.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
- class qiuwenbot.filter.expired_templates.RemoveExpiredDeadFilter2[source]¶
Bases:
RemoveExpiredTemplateFilter
Filter to remove {{最近逝世}}.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
- class qiuwenbot.filter.expired_templates.RemoveExpiredDeadFilter3[source]¶
Bases:
RemoveExpiredTemplateFilter
Filter to remove {{recent death}}.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
- class qiuwenbot.filter.expired_templates.RemoveExpiredTemplateFilter(template: str)[source]¶
Bases:
Filter
Filter to remove a certain tag.
- Parameters:
- tagstr
Tag name to remove.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
qiuwenbot.filter.filter module¶
- class qiuwenbot.filter.filter.Filter[source]¶
Bases:
object
Filter texts.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
- class qiuwenbot.filter.filter.FilterChain(filters: List[Filter])[source]¶
Bases:
Filter
Filter chain.
- Parameters:
- filterslist of Filter
Filters to apply.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
- class qiuwenbot.filter.filter.TextReplaceFilter(pattern: str, repl: str)[source]¶
Bases:
Filter
Filter to replace texts.
- Parameters:
- patternstr
Pattern to replace.
- replstr
Replacement.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
- qiuwenbot.filter.filter.register_filter(cls: Filter) Filter [source]¶
Return a decorator to register filter.
The filter should not have any parameters in its constructor.
- Parameters:
- clsFilter
Filter to register.
- Returns:
- Filter
Registered filter.
Examples
>>> @register_filter() ... class Filter1(Filter): ... pass
qiuwenbot.filter.gov module¶
qiuwenbot.filter.history module¶
- class qiuwenbot.filter.history.FakeManchukuoFilter[source]¶
Bases:
TextReplaceFilter
Filter for Fake Manchukuo authorities.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
qiuwenbot.filter.hk module¶
qiuwenbot.filter.roc_flag module¶
- class qiuwenbot.filter.roc_flag.ReplaceROCyear[source]¶
Bases:
Filter
Filter to replace ROC flag from a string.
- Parameters:
- patternstr
Pattern to replace.
- replstr
Replacement.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
qiuwenbot.filter.roc_year module¶
- class qiuwenbot.filter.roc_year.ReplaceROCyear[source]¶
Bases:
Filter
Filter to replace ROC year from a string.
- Parameters:
- patternstr
Pattern to replace.
- replstr
Replacement.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
qiuwenbot.filter.tw module¶
- class qiuwenbot.filter.tw.TWJPFilter[source]¶
Bases:
TextReplaceFilter
Filter to fix the Japanese authorities.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
- class qiuwenbot.filter.tw.TWLeaderFilter[source]¶
Bases:
TextReplaceFilter
Filter to fix the leader name in the Taiwan area.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
- class qiuwenbot.filter.tw.TWNameFilter1[source]¶
Bases:
TextReplaceFilter
Filter to fix the name of the Taiwan area.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
- class qiuwenbot.filter.tw.TWNameFilter2[source]¶
Bases:
TextReplaceFilter
Filter to fix the name of the Taiwan area.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
- class qiuwenbot.filter.tw.TWQingFilter[source]¶
Bases:
TextReplaceFilter
Filter to fix the Qing authorities.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
- class qiuwenbot.filter.tw.TWUnivFilter1[source]¶
Bases:
TextReplaceFilter
Filter to fix the name of unversities in the Taiwan area.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
- class qiuwenbot.filter.tw.TWUnivFilter2[source]¶
Bases:
TextReplaceFilter
Filter to fix the name of unversities in the Taiwan area.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
- class qiuwenbot.filter.tw.TWWithOthersFilter1[source]¶
Bases:
TextReplaceFilter
Filter to fix the Taiwan name when it is with other countries.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
- class qiuwenbot.filter.tw.TWWithOthersFilter2[source]¶
Bases:
TextReplaceFilter
Filter to fix the Taiwan name when it is with other countries.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
- class qiuwenbot.filter.tw.TWWithOthersInTitleFilter[source]¶
Bases:
Filter
Filter to fix the Taiwan name in title when it is with other countries.
- Attributes:
log
Log of the filter.
Methods
filter
(text)Filter text.
qiuwenbot.filter.wg module¶
qiuwenbot.task package¶
Tasks.
Submodules¶
qiuwenbot.task.duplicate module¶
Check duplicated page with different variants of Chinese titles, such as zh-cn and zh-hk.
- class qiuwenbot.task.duplicate.CheckDuplicatedPageTask(user: str, password: str, pages: dict)[source]¶
Bases:
Task
A task to check duplicated pages.
- Parameters:
- userstr
Username.
- passwordstr
Password.
- pagesstr
Pages to operate.
Methods
do
(page)Do the task.
logging
(title)Log the removing operator.
submit
()Submit the task.
qiuwenbot.task.filter module¶
qiuwenbot.task.task module¶
- class qiuwenbot.task.task.Task(user: str, password: str, pages: dict, logging_page: str = None, summary: str = '')[source]¶
Bases:
object
A task to be done.
- Parameters:
- userstr
Username.
- passwordstr
Password.
- pagesdict
Pages to operate.
- logging_pagestr, optional
Page to log the task, by default None
- summarystr, optional
Summary of the task, by default emptry string
Methods
Submodules¶
qiuwenbot.argparse module¶
qiuwenbot.bot module¶
qiuwenbot.qwfamily module¶
- class qiuwenbot.qwfamily.QiuwenFamily[source]¶
Bases:
Family
Qiuwen faimily.
- Attributes:
- interwiki_forward
obsolete
Old codes that are not part of the family.
- shared_urlshortner_wiki
Methods
apipath
(code)Return path to api.php.
base_url
(code, uri[, protocol])Prefix uri with port and hostname.
category_redirects
(code[, fallback])Return list of category redirect templates.
dbName
(code)Return the name of the MySQL database.
disambig
(code[, fallback])Return list of disambiguation templates.
encoding
(code)Return the encoding for a specific language wiki.
encodings
(code)Return list of historical encodings for a specific language wiki.
eventstreams_host
(code)Hostname for EventStreams.
eventstreams_path
(code)Return path for EventStreams.
from_url
(url)Return whether this family matches the given url.
get_address
(code, title)Return the path to title using index.php with redirects disabled.
get_archived_page_templates
(code)Return tuple of archived page templates.
get_edit_restricted_templates
(code)Return tuple of edit restricted templates.
hostname
(code)The hostname to use for standard http connections.
interface
(code)Return interface to use for code.
isPublic
()Check the wiki require logging in before viewing it.
linktrail
(code)Return regex for trailing chars displayed as part of a link.
load
([fam])Import the named family.
maximum_GET_length
(code)Return the maximum URL length for GET instead of POST.
path
(code)Return path to index.php.
post_get_convert
(site, getText)Do a conversion on the retrieved text from the Wiki.
pre_put_convert
(site, putText)Do a conversion on the text to insert on the Wiki.
protocol
(code)The protocol to use to connect to the site.
querypath
(code)Return path to query.php.
scriptpath
(code)The prefix used to locate scripts on this wiki.
shared_image_repository
(code)Return the shared image repository, if any.
ssl_hostname
(code)The hostname to use for SSL connections.
ssl_pathprefix
(code)The path prefix for secure HTTP access.
verify_SSL_certificate
(code)Return whether a HTTPS certificate should be verified.
- instance = Family("qiuwen")¶
- protocol(code)[source]¶
The protocol to use to connect to the site.
May be overridden to return ‘http’. Other protocols are not supported.
Changed in version 8.2:
https
is returned instead ofhttp
.- Parameters:
code – language code
- Returns:
protocol that this family uses
- scriptpath(code)[source]¶
The prefix used to locate scripts on this wiki.
This is the value displayed when you enter {{SCRIPTPATH}} on a wiki page (often displayed at [[Help:Variables]] if the wiki has copied the master help page correctly).
The default value is the one used on Wikimedia Foundation wikis, but needs to be overridden in the family file for any wiki that uses a different value.
- Parameters:
code – Site code
- Raises:
KeyError – code is not recognised
- Returns:
URL path without ending ‘/’
qiuwenbot.qwlogger module¶
qiuwenbot.user-config module¶
qiuwenbot.utils module¶
- qiuwenbot.utils.archieve_page(page: Page, site: Site) Page [source]¶
Archieve a page.
- Parameters:
- page: pywikibot.Page
page to archieve
- site: pywikibot.Site
qiuwen site
- Returns:
- pywikibot.Page
page with old title
- qiuwenbot.utils.devide_parameters(params: str) Dict[str, str] [source]¶
Devide parameters and remove subtemplate in it.
- Parameters:
- paramsstr
parameter string
- Returns:
- Dict[str, str]
dict of params