Load a web page

nullstellensatz

I am trying to load a web page using PySide's QtWebKit module. According to the documentation (Elements of QWebView; QWebFrame::toHtml()), the following script should print the HTML of the Google Search Page:

from PySide import QtCore
from PySide import QtGui
from PySide import QtWebKit

# Needed if we want to display the webpage in a widget.
app = QtGui.QApplication([])

view = QtWebKit.QWebView(None)
view.setUrl(QtCore.QUrl("http://www.google.com/"))
frame = view.page().mainFrame()
print(frame.toHtml())

But alas it does not. All that is printed is the method's equivalent of a null response:

<html><head></head><body></body></html>

So I took a closer look at the setUrl documentation:

The view remains the same until enough data has arrived to display the new url.

This made me think that maybe I was calling the toHtml() method too soon, before a response has been received from the server. So I wrote a class that overrides the setUrl method, blocking until the loadFinished signal is triggered:

import time

class View(QtWebKit.QWebView):
    def __init__(self, *args, **kwargs):
        super(View, self).__init__(*args, **kwargs)
        self.completed = True
        self.loadFinished.connect(self.setCompleted)

    def setCompleted(self):
        self.completed = True

    def setUrl(self, url):
        self.completed = False
        super(View, self).setUrl(url)
        while not self.completed:
            time.sleep(0.2)

view = View(None)
view.setUrl(QtCore.QUrl("http://www.google.com/"))
frame = view.page().mainFrame()
print(frame.toHtml())

That made no difference at all. What am I missing here?

EDIT: Merely getting the HTML of a page is not my end game here. This is a simplified example of code that was not working the way I expected it to. Credit to Oleh for suggesting replacing time.sleep() with app.processEvents()

Oleh Prypin

Copied from my other answer:

from PySide.QtCore import QObject, QUrl, Slot
from PySide.QtGui import QApplication
from PySide.QtWebKit import QWebPage, QWebSettings

qapp = QApplication([])

def load_source(url):
    page = QWebPage()
    page.settings().setAttribute(QWebSettings.AutoLoadImages, False)
    page.mainFrame().setUrl(QUrl(url))

    class State(QObject):
        src = None
        finished = False

        @Slot()
        def loaded(self, success=True):
            self.finished = True
            if self.src is None:
                self.src = page.mainFrame().toHtml()
    state = State()

    # Optional; reacts to DOM ready, which happens before a full load
    def js():
        page.mainFrame().addToJavaScriptWindowObject('qstate$', state)
        page.mainFrame().evaluateJavaScript('''
            document.addEventListener('DOMContentLoaded', qstate$.loaded);
        ''')
    page.mainFrame().javaScriptWindowObjectCleared.connect(js)

    page.mainFrame().loadFinished.connect(state.loaded)

    while not state.finished:
        qapp.processEvents()

    return state.src

load_source downloads the data from an URL and returns the HTML after modification by WebKit. It wraps Qt's event loop with its asynchronous events, and is a blocking function.

But you really should think what you're doing. Do you actually need to invoke the engine and get the modified HTML? If you just want to download HTML of some webpage, there are much, much simpler ways to do this.

Now, the problem with the code in your answer is you don't let Qt do anything. There is no magic happening, no code running in background. Qt is based on an event loop, and you never let it enter that loop. This is usually achieved by calling QApplication.exec_ or with a workaround processEvents as shown in my code. You can replace time.sleep(0.2) with app.processEvents() and it might just work.

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

how to load JWPlayer in web page

分類Dev

iPad Safari Web Inspector crashing on page load

分類Dev

Get really current class name in Page_Load method - ASP.NET Web Application (.NET Framework)

分類Dev

Angular - Animate on page load

分類Dev

FadeOut div on page load

分類Dev

Load respond.js after page load

分類Dev

PHP load page only on Included page

分類Dev

Random/new text on page refresh and page load

分類Dev

Infinite loop, page unable to load

分類Dev

firestore image not load to profile page

分類Dev

DataTables conditional calculation on page load

分類Dev

Background color change on page load

分類Dev

Load Page to Iframe with using javascript

分類Dev

jQuery: Is CheckBox checked on page load?

分類Dev

How to disable a button on page load?

分類Dev

Stop Triggering Event on Page Load

分類Dev

Load multiple content in same page

分類Dev

How to load a html page in javascript

分類Dev

Downloading contents of the web page

分類Dev

VBA web page scroll

分類Dev

Images not showing on web page

分類Dev

Page_Loadの新しいuserControlには、Webフォームにcssがありません

分類Dev

Semantic UI Tabs - load active segment remotely on initial page load

分類Dev

How to load a jPlayer playlist from localStorage on page load?

分類Dev

How to clear cache memory on load of HTML page?

分類Dev

jQuery function load reloads the whole page

分類Dev

run js script in chrome extension on page load

分類Dev

show:target popup on page load with PURE CSS

分類Dev

how to open infowindow in page load in google map