Recently, a new Web automation tool has become popular, which is used by python giants

Recently, a new Web automation tool has become popular, which is used by python giants

2022-09-30 0 886
Resource Number 43976 Last Updated 2025-02-24
¥ 0USD Upgrade VIP
Download Now Matters needing attention
Can't download? Please contact customer service to submit a link error!
Value-added Service: Installation Guide Environment Configuration Secondary Development Template Modification Source Code Installation

This issue recommends DrissionPage, an open source Python-based Web automation integration tool.

Recently, a new Web automation tool has become popular, which is used by python giants插图

Use requests to do data collection in the face of the website to log in, to analyze data packets, JS source code, construct complex requests, often have to deal with verification code, JS confusion, signature parameters and other anti-crawling means, the threshold is high. If the data is generated by JS calculation, the calculation process must be reproduced, which is not good experience and development efficiency is not high.

With selenium, these pits can be largely bypassed, but selenium is not very efficient. Therefore, this library combines selenium and requests into one, switches the corresponding mode when different needs are needed, and provides a user-friendly way to improve development and operation efficiency.

In addition to merging the two, this library also encapsulates the common functions by the unit of web page, simplifies the operation and statement of selenium, and reduces the consideration of details and focuses on the realization of functions when it is used for web page automation operation, which is more convenient to use. Everything is simple, try to provide simple and direct use of the method, more friendly to the novice.

class=”pgc-h-arrow-right” data-track=”7″> feature

    • Code is highly integrated, with concise code as the first pursuit.

The

  • page object can switch between selenium mode and requests mode to retain the login state.
  • extremely simple but powerful element location syntax, support chain operation, the code is extremely concise.
  • Both modes provide a consistent API and a consistent experience.
  • Humanized design, integration of many practical functions, greatly reduce the development workload

class=”pgc-h-arrow-right” data-track=”15″>

    • You can use an open browser repeatedly each time you run the program. For example, manually set the webpage to a certain state, and then use the program to take over, or manually handle the login, and then use the program to crawl the content. No need to start the browser from scratch every time you run it, super convenient
    • Use ini file to save common configuration, automatic call, also provide convenient setting API, away from complicated configuration items
    • extremely concise positioning syntax, support direct positioning of elements by text, support direct access to siblings and parents, etc.
    • Powerful download tool, even when operating the browser can enjoy fast and reliable download function
    • The download tool supports multiple ways to handle file name conflicts, automatically create target paths, break links and retry, etc.
    • Access URL with automatic retry function, can set interval and timeout time
    • Access web page can automatically identify the code, no need to manually set
    • Link parameters automatically generate Host and Referer attributes by default
    • can hide or display the browser process window directly at any time, non-headless or minimized
    • can automatically download the appropriate version of chromedriver, eliminating the hassle of configuration
    • d mode lookup element built-in wait, you can arbitrarily set the global wait time or single lookup wait time
    • Click element to integrate js click mode, one parameter can switch click mode
    • Click support failure to retry, can be used to ensure the success of the click, to interpret whether the web page mask layer disappears, etc.
    • The input text can automatically determine whether it is successful and retry, avoiding the occurrence of input or empty failure in some cases
    • d mode supports full-function xpath, which can directly obtain an attribute of an element. selenium native does not have this function
    • supports getting the shadow-root directly and manipulating the element below it like a normal element
    • supports getting the contents of after and before pseudo-elements directly
    • can be used directly under the element. Gets the direct child of the current element as a css selector.

is not supported by native

  • can simply use lxml to parse D-mode pages or elements, which greatly improves the speed of crawling complex page data
  • The output data has been transcoded and processed for basic typesetting, reducing repeated labor
  • can be easily interlinked with selenium or requests native code to facilitate project migration
  • is encapsulated in POM mode, which can be used directly for testing and easy to expand
  • d mode configuration can be compatible with debugger_address and other parameters, native is not compatible with

class=”pgc-h-arrow-right” data-track=”43″>

As shown in the figure, the Drission object is responsible for creating links, sharing login status, and so on, similar to the concept of driver in selenium. The MixPage object is responsible for parsing and manipulating the retrieved page. DriverElement and SessionElement are element objects that are retrieved from the page object. Responsible for parsing and manipulating elements.

Recently, a new Web automation tool has become popular, which is used by python giants插图1

class=”pgc-h-arrow-right” data-track=”40″>

and selenium code comparison

Go to the first TAB

# use selenium :
driver.switch_to.window(driver.window_handles[0])

DrissionPage :
page.to_tab(0)

Press text to select drop-down list

# using selenium: 
from selenium.webdriver.support.select import Select

select_element = Select(element)
select_element.select_by_visible_text('text')

# use DrissionPage: 
element.select('text')

Drag an element

# use selenium :
ActionChains(driver).drag_and_drop(ele1,  ele2).perform()

DrissionPage :
ele1.drag_to(ele2)

versus requests

Get element content

url = 'https://baike.baidu.com/item/python'

# use requests:  span
from lxml import etree

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36'}
response = requests.get(url, headers=headers)
html = etree.HTML(response.text)
element = html.xpath('//h1')[0]
title = element.text

# use DrissionPage: 
page = MixPage('s')
page.get(url)
title = page('tag:h1').text
url = 'https://www.baidu.com/img/flexible/logo/pc/result.png'
save_path = r'C:\download'

# use requests: 
r = requests.get(url)
with open(f'{save_path}\\img.png', 'wb') as fd:
   for chunk in r.iter_content():
       fd.write(chunk)
        
# use DrissionPage: 
page.download(url, save_path, 'img') # Support renaming, handle file name conflicts, Automatically create target folder 

Climb the COVID-19 chart

URL:
https://www.outbreak.my/zh/world, this example crawl new global champions league list. The site is a pure html page, especially suitable for S-mode crawling and parsing.

Recently, a new Web automation tool has become popular, which is used by python giants插图2

from DrissionPage import MixPage

# Create page object with s mode  Span >
page = MixPage('s')
  span
page.get('https://www.outbreak.my/zh/world')

# get header element 
thead = page('tag:thead')
# Get the header column, skip the hidden column  span
title = thead.eles('tag:th@@-style:display: none; ')
data = [th.text for th in title]

 print(data) # print header  span

# get content table elements  Span
tbody = page('tag:tbody')
# Get all rows  Span
rows = tbody.eles('tag:tr')
for row in rows: 
    # Gets all columns of the current row
    cols = row.eles('tag:td')  
    # Generate the current row data list (skip the useless columns)
    data = [td.text for k, td in enumerate(cols) if k not in (2, 4, 6)]
    
    print(data)  # Print line data 

Output:

[' total (205)', < span class = "HLJS - string" > 'cumulative confirmed' < / span >, < span class = "HLJS - string" > 'death' < / span >, < span class = "HLJS - string" > < / span > 'cure', ' Current diagnosis ', ' mortality ', ' recovery rate ']
[' US ', '55252823', '845745', '41467660',  < span class = "HLJS - string" > '12939418', < / span > < span class = "HLJS - string" > < / span > '1.53%', '75.05%'] span
[' India ', '34838804', '481080', '34266363',  < span class = "HLJS - string" > '91361', < / span > < span class = "HLJS - string" > < / span > '1.38%', '98.36%'] span
[' Brazil ', '22277239', '619024', '21567845',  < span class = "HLJS - string" > '90370', < / span > < span class = "HLJS - string" > < / span > '2.78%', '96.82%'] span
[' UK ', '12748050', '148421', '10271706',  < span class = "HLJS - string" > '2327923', < / span > < span class = "HLJS - string" > < / span > '1.16%', '80.57%'] span
[' Russia ', '10499982', '308860', '9463919',  < span class = "HLJS - string" > '727203', < / span > < span class = "HLJS - string" > < / span > '2.94%', '90.13%'] span
[' France ', '9740600', '123552', '8037752',  < span class = "HLJS - string" > '1579296', < / span > < span class = "HLJS - string" > < / span > '1.27%', '82.52%'] span
...

Go to gitee

URL: https://gitee.com/login. This example demonstrates how to automatically login to the gitee website by controlling the browser.

from DrissionPage import MixPage

# Create a page object in d mode Span >
page = MixPage()
# Jump to page Span
page.get(‘https://gitee.com/login’)

# Navigate to the account text box and enter the account span
page.ele(‘#user_login’).input(‘ your account ‘)
# Locate the password text field and enter the password span class=”hljs-comment”>#
page.ele(‘#user_password’).input(‘ your password’)
# click login button span
Page. Ele (< span class = “HLJS – string” > ‘@ value = log’ < / span >). Click () < / code > < / pre >

—END—

Open source: BSD-3-Clause

资源下载此资源为免费资源立即下载
Telegram:@John_Software

Disclaimer: This article is published by a third party and represents the views of the author only and has nothing to do with this website. This site does not make any guarantee or commitment to the authenticity, completeness and timeliness of this article and all or part of its content, please readers for reference only, and please verify the relevant content. The publication or republication of articles by this website for the purpose of conveying more information does not mean that it endorses its views or confirms its description, nor does it mean that this website is responsible for its authenticity.

Ictcoder Free source code Recently, a new Web automation tool has become popular, which is used by python giants https://ictcoder.com/kyym/recently-another-popular-web-automation-tool-is-being-used-by-python-experts.html

Share free open-source source code

Q&A
  • 1, automatic: after taking the photo, click the (download) link to download; 2. Manual: After taking the photo, contact the seller to issue it or contact the official to find the developer to ship.
View details
  • 1, the default transaction cycle of the source code: manual delivery of goods for 1-3 days, and the user payment amount will enter the platform guarantee until the completion of the transaction or 3-7 days can be issued, in case of disputes indefinitely extend the collection amount until the dispute is resolved or refunded!
View details
  • 1. Heptalon will permanently archive the process of trading between the two parties and the snapshots of the traded goods to ensure that the transaction is true, effective and safe! 2, Seven PAWS can not guarantee such as "permanent package update", "permanent technical support" and other similar transactions after the merchant commitment, please identify the buyer; 3, in the source code at the same time there is a website demonstration and picture demonstration, and the site is inconsistent with the diagram, the default according to the diagram as the dispute evaluation basis (except for special statements or agreement); 4, in the absence of "no legitimate basis for refund", the commodity written "once sold, no support for refund" and other similar statements, shall be deemed invalid; 5, before the shooting, the transaction content agreed by the two parties on QQ can also be the basis for dispute judgment (agreement and description of the conflict, the agreement shall prevail); 6, because the chat record can be used as the basis for dispute judgment, so when the two sides contact, only communicate with the other party on the QQ and mobile phone number left on the systemhere, in case the other party does not recognize self-commitment. 7, although the probability of disputes is very small, but be sure to retain such important information as chat records, mobile phone messages, etc., in case of disputes, it is convenient for seven PAWS to intervene in rapid processing.
View details
  • 1. As a third-party intermediary platform, Qichou protects the security of the transaction and the rights and interests of both buyers and sellers according to the transaction contract (commodity description, content agreed before the transaction); 2, non-platform online trading projects, any consequences have nothing to do with mutual site; No matter the seller for any reason to require offline transactions, please contact the management report.
View details

Related Article

make a comment
No comments available at the moment
Official customer service team

To solve your worries - 24 hours online professional service