Recently, a new Web automation tool has become popular, which is used by python giants

Recently, a new Web automation tool has become popular, which is used by python giants

2022-09-30 0 1,244
Resource Number 43976 Last Updated 2025-02-24
¥ 0HKD Upgrade VIP
Download Now Matters needing attention
Can't download? Please contact customer service to submit a link error!
Value-added Service: Installation Guide Environment Configuration Secondary Development Template Modification Source Code Installation

This issue recommends DrissionPage, an open source Python-based Web automation integration tool.

Recently, a new Web automation tool has become popular, which is used by python giants插图

Use requests to do data collection in the face of the website to log in, to analyze data packets, JS source code, construct complex requests, often have to deal with verification code, JS confusion, signature parameters and other anti-crawling means, the threshold is high. If the data is generated by JS calculation, the calculation process must be reproduced, which is not good experience and development efficiency is not high.

With selenium, these pits can be largely bypassed, but selenium is not very efficient. Therefore, this library combines selenium and requests into one, switches the corresponding mode when different needs are needed, and provides a user-friendly way to improve development and operation efficiency.

In addition to merging the two, this library also encapsulates the common functions by the unit of web page, simplifies the operation and statement of selenium, and reduces the consideration of details and focuses on the realization of functions when it is used for web page automation operation, which is more convenient to use. Everything is simple, try to provide simple and direct use of the method, more friendly to the novice.

class=”pgc-h-arrow-right” data-track=”7″> feature

    • Code is highly integrated, with concise code as the first pursuit.

The

  • page object can switch between selenium mode and requests mode to retain the login state.
  • extremely simple but powerful element location syntax, support chain operation, the code is extremely concise.
  • Both modes provide a consistent API and a consistent experience.
  • Humanized design, integration of many practical functions, greatly reduce the development workload

class=”pgc-h-arrow-right” data-track=”15″>

    • You can use an open browser repeatedly each time you run the program. For example, manually set the webpage to a certain state, and then use the program to take over, or manually handle the login, and then use the program to crawl the content. No need to start the browser from scratch every time you run it, super convenient
    • Use ini file to save common configuration, automatic call, also provide convenient setting API, away from complicated configuration items
    • extremely concise positioning syntax, support direct positioning of elements by text, support direct access to siblings and parents, etc.
    • Powerful download tool, even when operating the browser can enjoy fast and reliable download function
    • The download tool supports multiple ways to handle file name conflicts, automatically create target paths, break links and retry, etc.
    • Access URL with automatic retry function, can set interval and timeout time
    • Access web page can automatically identify the code, no need to manually set
    • Link parameters automatically generate Host and Referer attributes by default
    • can hide or display the browser process window directly at any time, non-headless or minimized
    • can automatically download the appropriate version of chromedriver, eliminating the hassle of configuration
    • d mode lookup element built-in wait, you can arbitrarily set the global wait time or single lookup wait time
    • Click element to integrate js click mode, one parameter can switch click mode
    • Click support failure to retry, can be used to ensure the success of the click, to interpret whether the web page mask layer disappears, etc.
    • The input text can automatically determine whether it is successful and retry, avoiding the occurrence of input or empty failure in some cases
    • d mode supports full-function xpath, which can directly obtain an attribute of an element. selenium native does not have this function
    • supports getting the shadow-root directly and manipulating the element below it like a normal element
    • supports getting the contents of after and before pseudo-elements directly
    • can be used directly under the element. Gets the direct child of the current element as a css selector.

is not supported by native

  • can simply use lxml to parse D-mode pages or elements, which greatly improves the speed of crawling complex page data
  • The output data has been transcoded and processed for basic typesetting, reducing repeated labor
  • can be easily interlinked with selenium or requests native code to facilitate project migration
  • is encapsulated in POM mode, which can be used directly for testing and easy to expand
  • d mode configuration can be compatible with debugger_address and other parameters, native is not compatible with

class=”pgc-h-arrow-right” data-track=”43″>

As shown in the figure, the Drission object is responsible for creating links, sharing login status, and so on, similar to the concept of driver in selenium. The MixPage object is responsible for parsing and manipulating the retrieved page. DriverElement and SessionElement are element objects that are retrieved from the page object. Responsible for parsing and manipulating elements.

Recently, a new Web automation tool has become popular, which is used by python giants插图1

class=”pgc-h-arrow-right” data-track=”40″>

and selenium code comparison

Go to the first TAB

# use selenium :
driver.switch_to.window(driver.window_handles[0])

DrissionPage :
page.to_tab(0)

Press text to select drop-down list

# using selenium: 
from selenium.webdriver.support.select import Select

select_element = Select(element)
select_element.select_by_visible_text('text')

# use DrissionPage: 
element.select('text')

Drag an element

# use selenium :
ActionChains(driver).drag_and_drop(ele1,  ele2).perform()

DrissionPage :
ele1.drag_to(ele2)

versus requests

Get element content

url = 'https://baike.baidu.com/item/python'

# use requests:  span
from lxml import etree

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36'}
response = requests.get(url, headers=headers)
html = etree.HTML(response.text)
element = html.xpath('//h1')[0]
title = element.text

# use DrissionPage: 
page = MixPage('s')
page.get(url)
title = page('tag:h1').text
url = 'https://www.baidu.com/img/flexible/logo/pc/result.png'
save_path = r'C:\download'

# use requests: 
r = requests.get(url)
with open(f'{save_path}\\img.png', 'wb') as fd:
   for chunk in r.iter_content():
       fd.write(chunk)
        
# use DrissionPage: 
page.download(url, save_path, 'img') # Support renaming, handle file name conflicts, Automatically create target folder 

Climb the COVID-19 chart

URL:
https://www.outbreak.my/zh/world, this example crawl new global champions league list. The site is a pure html page, especially suitable for S-mode crawling and parsing.

Recently, a new Web automation tool has become popular, which is used by python giants插图2

from DrissionPage import MixPage

# Create page object with s mode  Span >
page = MixPage('s')
  span
page.get('https://www.outbreak.my/zh/world')

# get header element 
thead = page('tag:thead')
# Get the header column, skip the hidden column  span
title = thead.eles('tag:th@@-style:display: none; ')
data = [th.text for th in title]

 print(data) # print header  span

# get content table elements  Span
tbody = page('tag:tbody')
# Get all rows  Span
rows = tbody.eles('tag:tr')
for row in rows: 
    # Gets all columns of the current row
    cols = row.eles('tag:td')  
    # Generate the current row data list (skip the useless columns)
    data = [td.text for k, td in enumerate(cols) if k not in (2, 4, 6)]
    
    print(data)  # Print line data 

Output:

[' total (205)', < span class = "HLJS - string" > 'cumulative confirmed' < / span >, < span class = "HLJS - string" > 'death' < / span >, < span class = "HLJS - string" > < / span > 'cure', ' Current diagnosis ', ' mortality ', ' recovery rate ']
[' US ', '55252823', '845745', '41467660',  < span class = "HLJS - string" > '12939418', < / span > < span class = "HLJS - string" > < / span > '1.53%', '75.05%'] span
[' India ', '34838804', '481080', '34266363',  < span class = "HLJS - string" > '91361', < / span > < span class = "HLJS - string" > < / span > '1.38%', '98.36%'] span
[' Brazil ', '22277239', '619024', '21567845',  < span class = "HLJS - string" > '90370', < / span > < span class = "HLJS - string" > < / span > '2.78%', '96.82%'] span
[' UK ', '12748050', '148421', '10271706',  < span class = "HLJS - string" > '2327923', < / span > < span class = "HLJS - string" > < / span > '1.16%', '80.57%'] span
[' Russia ', '10499982', '308860', '9463919',  < span class = "HLJS - string" > '727203', < / span > < span class = "HLJS - string" > < / span > '2.94%', '90.13%'] span
[' France ', '9740600', '123552', '8037752',  < span class = "HLJS - string" > '1579296', < / span > < span class = "HLJS - string" > < / span > '1.27%', '82.52%'] span
...

Go to gitee

URL: https://gitee.com/login. This example demonstrates how to automatically login to the gitee website by controlling the browser.

from DrissionPage import MixPage

# Create a page object in d mode Span >
page = MixPage()
# Jump to page Span
page.get(‘https://gitee.com/login’)

# Navigate to the account text box and enter the account span
page.ele(‘#user_login’).input(‘ your account ‘)
# Locate the password text field and enter the password span class=”hljs-comment”>#
page.ele(‘#user_password’).input(‘ your password’)
# click login button span
Page. Ele (< span class = “HLJS – string” > ‘@ value = log’ < / span >). Click () < / code > < / pre >

—END—

Open source: BSD-3-Clause

资源下载此资源为免费资源立即下载
Telegram:@John_Software

Disclaimer: This article is published by a third party and represents the views of the author only and has nothing to do with this website. This site does not make any guarantee or commitment to the authenticity, completeness and timeliness of this article and all or part of its content, please readers for reference only, and please verify the relevant content. The publication or republication of articles by this website for the purpose of conveying more information does not mean that it endorses its views or confirms its description, nor does it mean that this website is responsible for its authenticity.

Ictcoder Free Source Code Recently, a new Web automation tool has become popular, which is used by python giants https://ictcoder.com/recently-another-popular-web-automation-tool-is-being-used-by-python-experts/

Share free open-source source code

Q&A
  • 1. Automatic: After making an online payment, click the (Download) link to download the source code; 2. Manual: Contact the seller or the official to check if the template is consistent. Then, place an order and make payment online. The seller ships the goods, and both parties inspect and confirm that there are no issues. ICTcoder will then settle the payment for the seller. Note: Please ensure to place your order and make payment through ICTcoder. If you do not place your order and make payment through ICTcoder, and the seller sends fake source code or encounters any issues, ICTcoder will not assist in resolving them, nor can we guarantee your funds!
View details
  • 1. Default transaction cycle for source code: The seller manually ships the goods within 1-3 days. The amount paid by the user will be held in escrow by ICTcoder until 7 days after the transaction is completed and both parties confirm that there are no issues. ICTcoder will then settle with the seller. In case of any disputes, ICTcoder will have staff to assist in handling until the dispute is resolved or a refund is made! If the buyer places an order and makes payment not through ICTcoder, any issues and disputes have nothing to do with ICTcoder, and ICTcoder will not be responsible for any liabilities!
View details
  • 1. ICTcoder will permanently archive the transaction process between both parties and snapshots of the traded goods to ensure the authenticity, validity, and security of the transaction! 2. ICTcoder cannot guarantee services such as "permanent package updates" and "permanent technical support" after the merchant's commitment. Buyers are advised to identify these services on their own. If necessary, they can contact ICTcoder for assistance; 3. When both website demonstration and image demonstration exist in the source code, and the text descriptions of the website and images are inconsistent, the text description of the image shall prevail as the basis for dispute resolution (excluding special statements or agreements); 4. If there is no statement such as "no legal basis for refund" or similar content, any indication on the product that "once sold, no refunds will be supported" or other similar declarations shall be deemed invalid; 5. Before the buyer places an order and makes payment, the transaction details agreed upon by both parties via WhatsApp or email can also serve as the basis for dispute resolution (in case of any inconsistency between the agreement and the description of the conflict, the agreement shall prevail); 6. Since chat records and email records can serve as the basis for dispute resolution, both parties should only communicate with each other through the contact information left on the system when contacting each other, in order to prevent the other party from denying their own commitments. 7. Although the probability of disputes is low, it is essential to retain important information such as chat records, text messages, and email records, in case a dispute arises, so that ICTcoder can intervene quickly.
View details
  • 1. As a third-party intermediary platform, ICTcoder solely protects transaction security and the rights and interests of both buyers and sellers based on the transaction contract (product description, agreed content before the transaction); 2. For online trading projects not on the ICTcoder platform, any consequences are unrelated to this platform; regardless of the reason why the seller requests an offline transaction, please contact the administrator to report.
View details

Related Source code

ICTcoder Customer Service

24-hour online professional services