Qiuwen Baike bot / 求闻百科机器人

Qiuwen Baike bot. / 求闻百科机器人。

Features / 功能

  • Comprehensive Cleanup / 综合治理

    • Cleaning up illegal references / 清理非法参考文献

    • Cleaning up unsupported HTML tags / 清理不支持的HTML标签

    • Cleaning up expired templates / 清理过期模板

    • Removing illegal flags of the Taiwan authorities / 清理台湾当局非法旗帜

    • Removing illegal era names of the Taiwan authorities / 清理台湾当局非法年号

    • Correcting terms related to Taiwan / 修正涉台用语

    • Correcting terms related to Hong Kong / 修正涉港用语

    • Correcting terms related to politics / 修正涉政用语

    • Correcting terms related to “Cultural Revolution” / 修正“文革”用语

  • Marking redundant entries caused by differences between Simplified and Traditional Chinese / 标记简繁差异造成的重复条目

Disclaimer / 免责声明

Qiuwen Baike® is a trademark or registered trademark of the operator of Qiuwen Baike website or its affiliated entities. This project uses the trademark legitimately based on Article 59 of the “Trademark Law of the People’s Republic of China”. When using it in the modified version of this project, one should observe the “Trademark Law”. / 求闻百科®是求闻百科网站运营者或其关联实体的商标或注册商标,本项目基于《中华人民共和国商标法》第五十九条对该商标正当使用。在本项目的修改版本中使用时,应注意遵守《商标法》。

Table of Contents

Installation

To install qiuwenbot, simply use pip:

pip install qiuwenbot

Usage

Firstly, one needs to prepare a JSON file that describes the job.

1{
2    "user": "Njzjzbot",
3    "task": "filter",
4    "pages": {
5        "type": "all"
6    }
7}

The detailed parameters can be found in Input Parameters. The password can be set by QIUWENBOT_PASSWORD environment variable. It is recommended using the bot password.

Then, one can submit the job using the following command:

qiuwenbot submit filter_all.json

See Command line interface for more information.

Command line interface

usage: qiuwenbot [-h] {submit,gui} ...

Sub-commands

submit

Submit a task

qiuwenbot submit [-h] CONFIG
Positional Arguments
CONFIG

Path to the config file

gui

Serve DP-GUI.

qiuwenbot gui [-h] [-p PORT] [--bind_all]
Named Arguments
-p, --port

The port to serve DP-GUI on.

Default: 6042

--bind_all

Serve on all public interfaces. This will expose your DP-GUI instance to the network on both IPv4 and IPv6 (where available).

Default: False

Input Parameters

Note

One can load, modify, and export the input file by using our effective web-based tool DP-GUI hosted using the command line interface qiuwenbot gui. All parameters below can be set in DP-GUI. By clicking “SAVE JSON”, one can download the input file.

user:
type: str
argument path: user

Username.

pages:
type: dict
argument path: pages

Configurations of scanned pages.

Depending on the value of type, different sub args are accepted.

type:
type: str (flag key)
argument path: pages/type
possible choices: all, new, link, page

Method to scan pages.

When type is set to all:

Scan all pages in alphabetical order.

namespace:
type: int, optional, default: 0
argument path: pages[all]/namespace

Namespace(s) of the pages.

restart:
type: bool, optional, default: False
argument path: pages[all]/restart

Restart from the last page in the log.

When type is set to new:

Scan new pages.

namespace:
type: list | int, optional, default: 0
argument path: pages[new]/namespace

Namespace(s) of the pages.

start:
type: str | NoneType, optional, default: None
argument path: pages[new]/start

Start time in ISO format.

end:
type: str | NoneType, optional, default: None
argument path: pages[new]/end

End time in ISO format.

Scan pages that link to a page or include a template.

When type is set to page:

Scan a single page.

name:
type: str
argument path: pages[page]/name

Name of the page or template.

task:
type: str
argument path: task

Task to submit.

API documentation

qiuwenbot package

Qiuwen bot.

Subpackages

qiuwenbot.entrypoints package

Scripts.

Submodules
qiuwenbot.entrypoints.gui module

DP-GUI entrypoint.

qiuwenbot.entrypoints.gui.start_dpgui(args: Namespace)[source]

Host DP-GUI server.

Parameters:
argsargparse.Namespace

Arguments from argparse.

Raises:
ModuleNotFoundError

The dpgui package is not installed

qiuwenbot.entrypoints.submit module
qiuwenbot.entrypoints.submit.submit(args: Namespace)[source]

Submit a task.

qiuwenbot.filter package

Filter texts.

Submodules
qiuwenbot.filter.clean_refs module
class qiuwenbot.filter.clean_refs.CleanRefsFilter[source]

Bases: Filter

Filter to clean references.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

filter(text: str) str[source]

Filter text.

Parameters:
textstr

Text to filter.

Returns:
str

Filtered text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

qiuwenbot.filter.common module
qiuwenbot.filter.common.get_comment(comment: str) str[source]

Get comment inserted into wikitext.

qiuwenbot.filter.deprecated_tags module
class qiuwenbot.filter.deprecated_tags.RemoveMapframeFilter[source]

Bases: RemoveTagFilter

Filter to remove mapframe tag.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

class qiuwenbot.filter.deprecated_tags.RemoveScoreFilter[source]

Bases: RemoveTagFilter

Filter to remove score tag.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

class qiuwenbot.filter.deprecated_tags.RemoveTagFilter(tag: str)[source]

Bases: TextReplaceFilter

Filter to remove a certain tag.

Parameters:
tagstr

Tag name to remove.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

class qiuwenbot.filter.deprecated_tags.RemoveTimelineFilter[source]

Bases: RemoveTagFilter

Filter to remove timeline tag.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

qiuwenbot.filter.expired_templates module
class qiuwenbot.filter.expired_templates.RemoveExpiredCurrentFilter[source]

Bases: RemoveExpiredTemplateFilter

Filter to remove {{current}}.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

class qiuwenbot.filter.expired_templates.RemoveExpiredDeadFilter[source]

Bases: RemoveExpiredTemplateFilter

Filter to remove {{近期逝世}}.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

class qiuwenbot.filter.expired_templates.RemoveExpiredDeadFilter2[source]

Bases: RemoveExpiredTemplateFilter

Filter to remove {{最近逝世}}.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

class qiuwenbot.filter.expired_templates.RemoveExpiredDeadFilter3[source]

Bases: RemoveExpiredTemplateFilter

Filter to remove {{recent death}}.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

class qiuwenbot.filter.expired_templates.RemoveExpiredTemplateFilter(template: str)[source]

Bases: Filter

Filter to remove a certain tag.

Parameters:
tagstr

Tag name to remove.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

filter(text: str) str[source]

Filter text.

Parameters:
textstr

Text to filter.

Returns:
str

Filtered text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

qiuwenbot.filter.filter module
class qiuwenbot.filter.filter.Filter[source]

Bases: object

Filter texts.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

abstract filter(text: str) str[source]

Filter text.

Parameters:
textstr

Text to filter.

Returns:
str

Filtered text.

property log: str | None

Log of the filter.

Returns:
str

Log of the filter.

class qiuwenbot.filter.filter.FilterChain(filters: List[Filter])[source]

Bases: Filter

Filter chain.

Parameters:
filterslist of Filter

Filters to apply.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

filter(text: str) str[source]

Filter text.

Parameters:
textstr

Text to filter.

Returns:
str

Filtered text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

class qiuwenbot.filter.filter.TextReplaceFilter(pattern: str, repl: str)[source]

Bases: Filter

Filter to replace texts.

Parameters:
patternstr

Pattern to replace.

replstr

Replacement.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

filter(text: str) str[source]

Filter text.

Parameters:
textstr

Text to filter.

Returns:
str

Filtered text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

qiuwenbot.filter.filter.register_filter(cls: Filter) Filter[source]

Return a decorator to register filter.

The filter should not have any parameters in its constructor.

Parameters:
clsFilter

Filter to register.

Returns:
Filter

Registered filter.

Examples

>>> @register_filter()
... class Filter1(Filter):
...     pass
qiuwenbot.filter.gov module
class qiuwenbot.filter.gov.CNGovFilter[source]

Bases: TextReplaceFilter

Filter to fix the Chinese government terms.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

qiuwenbot.filter.history module
class qiuwenbot.filter.history.FakeManchukuoFilter[source]

Bases: TextReplaceFilter

Filter for Fake Manchukuo authorities.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

class qiuwenbot.filter.history.FakeWangFilter[source]

Bases: TextReplaceFilter

Filter for Fake Wang authorities.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

qiuwenbot.filter.hk module
class qiuwenbot.filter.hk.HKReunificationFilter[source]

Bases: TextReplaceFilter

Filter to fix the Hong Kong Reunification terms.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

qiuwenbot.filter.roc_flag module
class qiuwenbot.filter.roc_flag.ReplaceROCyear[source]

Bases: Filter

Filter to replace ROC flag from a string.

Parameters:
patternstr

Pattern to replace.

replstr

Replacement.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

filter(text: str) str[source]

Filter text.

Parameters:
textstr

Text to filter.

Returns:
str

Filtered text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

qiuwenbot.filter.roc_year module
class qiuwenbot.filter.roc_year.ReplaceROCyear[source]

Bases: Filter

Filter to replace ROC year from a string.

Parameters:
patternstr

Pattern to replace.

replstr

Replacement.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

filter(text: str) str[source]

Filter text.

Parameters:
textstr

Text to filter.

Returns:
str

Filtered text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

qiuwenbot.filter.tw module
class qiuwenbot.filter.tw.TWJPFilter[source]

Bases: TextReplaceFilter

Filter to fix the Japanese authorities.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

class qiuwenbot.filter.tw.TWLeaderFilter[source]

Bases: TextReplaceFilter

Filter to fix the leader name in the Taiwan area.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

class qiuwenbot.filter.tw.TWNameFilter1[source]

Bases: TextReplaceFilter

Filter to fix the name of the Taiwan area.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

class qiuwenbot.filter.tw.TWNameFilter2[source]

Bases: TextReplaceFilter

Filter to fix the name of the Taiwan area.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

class qiuwenbot.filter.tw.TWQingFilter[source]

Bases: TextReplaceFilter

Filter to fix the Qing authorities.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

class qiuwenbot.filter.tw.TWUnivFilter1[source]

Bases: TextReplaceFilter

Filter to fix the name of unversities in the Taiwan area.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

class qiuwenbot.filter.tw.TWUnivFilter2[source]

Bases: TextReplaceFilter

Filter to fix the name of unversities in the Taiwan area.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

class qiuwenbot.filter.tw.TWWithOthersFilter1[source]

Bases: TextReplaceFilter

Filter to fix the Taiwan name when it is with other countries.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

class qiuwenbot.filter.tw.TWWithOthersFilter2[source]

Bases: TextReplaceFilter

Filter to fix the Taiwan name when it is with other countries.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

class qiuwenbot.filter.tw.TWWithOthersInTitleFilter[source]

Bases: Filter

Filter to fix the Taiwan name in title when it is with other countries.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

filter(text: str) str[source]

Filter text.

Parameters:
textstr

Text to filter.

Returns:
str

Filtered text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

qiuwenbot.filter.wg module
class qiuwenbot.filter.wg.WengeFilter[source]

Bases: TextReplaceFilter

Filter to add quote to Wen Ge.

Attributes:
log

Log of the filter.

Methods

filter(text)

Filter text.

property log: str

Log of the filter.

Returns:
str

Log of the filter.

qiuwenbot.task package

Tasks.

Submodules
qiuwenbot.task.duplicate module

Check duplicated page with different variants of Chinese titles, such as zh-cn and zh-hk.

class qiuwenbot.task.duplicate.CheckDuplicatedPageTask(user: str, password: str, pages: dict)[source]

Bases: Task

A task to check duplicated pages.

Parameters:
userstr

Username.

passwordstr

Password.

pagesstr

Pages to operate.

Methods

do(page)

Do the task.

logging(title)

Log the removing operator.

submit()

Submit the task.

do(page: Page) bool[source]

Do the task.

qiuwenbot.task.duplicate.check_page(page: Page, site: Site)[source]

Check if a page has duplicated variants.

qiuwenbot.task.filter module
class qiuwenbot.task.filter.FilterTask(user: str, password: str, pages: dict)[source]

Bases: Task

A task to pass filters.

Parameters:
userstr

Username.

passwordstr

Password.

pagesstr

Pages to operate.

Methods

do(page)

Do the task.

logging(title)

Log the removing operator.

submit()

Submit the task.

do(page: Page) bool[source]

Do the task.

qiuwenbot.task.task module
class qiuwenbot.task.task.Task(user: str, password: str, pages: dict, logging_page: str = None, summary: str = '')[source]

Bases: object

A task to be done.

Parameters:
userstr

Username.

passwordstr

Password.

pagesdict

Pages to operate.

logging_pagestr, optional

Page to log the task, by default None

summarystr, optional

Summary of the task, by default emptry string

Methods

do(page)

Do the task.

logging(title)

Log the removing operator.

submit()

Submit the task.

abstract do(page: Page) bool[source]

Do the task.

logging(title: str) None[source]

Log the removing operator.

Parameters:
titlestr

title of the modified page

submit()[source]

Submit the task.

Submodules

qiuwenbot.argparse module

qiuwenbot.argparse.normalize(data: dict) dict[source]
qiuwenbot.argparse.page_variant() Variant[source]
qiuwenbot.argparse.submit_args() List[Argument][source]

qiuwenbot.bot module

qiuwenbot.bot.get_page(title: str, site: Site)[source]

Get the page with the specific title.

Parameters:
titlestr

title of the page

sitepywikibot.Site

qiuwen site

qiuwenbot.bot.login(user: str, password: str) Site[source]

Login to qiuwen.

Parameters:
user

username of the bot

password

password of the bot

Returns:
pywikibot.Site

qiuwen site

qiuwenbot.qwfamily module

class qiuwenbot.qwfamily.QiuwenFamily[source]

Bases: Family

Qiuwen faimily.

Attributes:
interwiki_forward
obsolete

Old codes that are not part of the family.

shared_urlshortner_wiki

Methods

apipath(code)

Return path to api.php.

base_url(code, uri[, protocol])

Prefix uri with port and hostname.

category_redirects(code[, fallback])

Return list of category redirect templates.

dbName(code)

Return the name of the MySQL database.

disambig(code[, fallback])

Return list of disambiguation templates.

encoding(code)

Return the encoding for a specific language wiki.

encodings(code)

Return list of historical encodings for a specific language wiki.

eventstreams_host(code)

Hostname for EventStreams.

eventstreams_path(code)

Return path for EventStreams.

from_url(url)

Return whether this family matches the given url.

get_address(code, title)

Return the path to title using index.php with redirects disabled.

get_archived_page_templates(code)

Return tuple of archived page templates.

get_edit_restricted_templates(code)

Return tuple of edit restricted templates.

hostname(code)

The hostname to use for standard http connections.

interface(code)

Return interface to use for code.

isPublic()

Check the wiki require logging in before viewing it.

linktrail(code)

Return regex for trailing chars displayed as part of a link.

load([fam])

Import the named family.

maximum_GET_length(code)

Return the maximum URL length for GET instead of POST.

path(code)

Return path to index.php.

post_get_convert(site, getText)

Do a conversion on the retrieved text from the Wiki.

pre_put_convert(site, putText)

Do a conversion on the text to insert on the Wiki.

protocol(code)

The protocol to use to connect to the site.

querypath(code)

Return path to query.php.

scriptpath(code)

The prefix used to locate scripts on this wiki.

shared_image_repository(code)

Return the shared image repository, if any.

ssl_hostname(code)

The hostname to use for SSL connections.

ssl_pathprefix(code)

The path prefix for secure HTTP access.

verify_SSL_certificate(code)

Return whether a HTTPS certificate should be verified.

instance = Family("qiuwen")
isPublic()[source]

Check the wiki require logging in before viewing it.

langs: dict[str, str] = {'zh': 'www.qiuwenbaike.cn'}
name: str | None = 'qiuwen'

The family name

protocol(code)[source]

The protocol to use to connect to the site.

May be overridden to return ‘http’. Other protocols are not supported.

Changed in version 8.2: https is returned instead of http.

Parameters:

code – language code

Returns:

protocol that this family uses

scriptpath(code)[source]

The prefix used to locate scripts on this wiki.

This is the value displayed when you enter {{SCRIPTPATH}} on a wiki page (often displayed at [[Help:Variables]] if the wiki has copied the master help page correctly).

The default value is the one used on Wikimedia Foundation wikis, but needs to be overridden in the family file for any wiki that uses a different value.

Parameters:

code – Site code

Raises:

KeyError – code is not recognised

Returns:

URL path without ending ‘/’

qiuwenbot.qwlogger module

qiuwenbot.user-config module

qiuwenbot.utils module

qiuwenbot.utils.archieve_page(page: Page, site: Site) Page[source]

Archieve a page.

Parameters:
page: pywikibot.Page

page to archieve

site: pywikibot.Site

qiuwen site

Returns:
pywikibot.Page

page with old title

qiuwenbot.utils.devide_parameters(params: str) Dict[str, str][source]

Devide parameters and remove subtemplate in it.

Parameters:
paramsstr

parameter string

Returns:
Dict[str, str]

dict of params

qiuwenbot.utils.get_cat_regex(name: str = '[^\\[\\]]+') Pattern[source]

Get categories regex.

Parameters:
namestr, optional

Name or regex of the category, by default all categories.

Returns:
List[str]

Categories.

qiuwenbot.utils.get_template_regex(name: str = '[^{\\|#0-9][^{\\|#]*?', end: str = '') Pattern[source]

Get templates regex.

Parameters:
namestr, optional

Name or regex of the template, by default all templates.

endstr, optional

End of the template, by default “”.

Returns:
List[str]

Templates.

Indices and tables