Web Pages

With the MAXQDA Web Collector, you can save web pages, parts of web pages, or PDF files in your browser and then import them into MAXQDA in different formats as PDF, image, or text. Additionally, you can directly import web pages in HTML format as MAXQDA text documents.

Saving web pages with the MAXQDA Web Collector

The MAXQDA Web Collector is an extension for the Internet browser “Google Chrome”. It allows you to save entire web pages and import the files as PDF, image, or text documents into MAXQDA. Among others, this tool is useful to compare websites from various organizations, or to collect content from web pages for analysis with MAXQDA.

Installing the MAXQDA Web Collector

In order to work with the MAXQDA Web Collector the installation of the Internet browser “Google Chrome” is required. As soon as “Chrome” is installed on your computer you can start the installation of the “MAXQDA Web Collector”:

  1. Open “Google Chrome”.
  2. Search the Chrome Web Store for “MAXQDA” or use the following link:
    https://chrome.google.com/webstore/detail/web-collector-for-maxqda/jhnochbooihpgjbgcjlpihaefoehlakd
  3. In order to add the extension to your browser click „+ Add“.

After a successful installation you can see a tiny MAXQDA icon top-right in your browser window (if not, please click on the “jigsaw” symbol and select the Web Collector to be displayed):

MAXQDA icon to open the MAXQDA Web Collector in „Google Chrome“

If you want to use the Web Collector just click on the MAXQDA icon:

MAXQDA Web Collector

The Web Collector offers four modes:

  • Entire web page: the layout is preserved in the best possible way.
  • Simplified web page: the web page will be reduced to key text and images as using a read mode.
  • Selections on web pages: only the selected part of the web page is collected.
  • PDF documents: the whole PDF files is collected.

Collecting entire web pages

If your research is based on the analysis of every visible item of the web page it is recommended to safe the entire web page in order to import it to your MAXQDA project as true to the original as possible:

  1. Open the web page you want to safe in Google Chrome.
  2. When the web page is fully loaded open the Web Collector by clicking the tiny MAXQDA icon.
  3. Make sure the tab “Web Page” is open.
  4. If required, change the proposed document name. This name will later be inherited into the MAXQDA project.
  5. If required, enter a text in the “Document Memo” box. This text will later be connected to the imported document as a document memo.
  6. Click Collect.

The MAXQDA Web Collector saves the web page in the default download folder of your browser. The web page is saved in MWEB format that was specially developed for a further processing in MAXQDA.

Web pages that are saved this way can be imported into MAXQDA as a PDF or image document.

Collecting simplified web pages

In case your analysis focuses on the text of a web page you can safe it as “simplified” web page. The web page will be reduced to main text and images which can be compared to the read mode of your smartphone. This is particularly useful for big newspapers and magazines. In order to import a “simplified” web page proceed as described below:

  1. Open the web page you want to safe in Google Chrome.
  2. When the web page is fully loaded open the Web Collector by clicking the tiny MAXQDA icon.
  3. Click on the “Simplified Web Page” tab. The web page will then be reduced to main parts.
  4. If required, adjust further options such as font type, font size, and exclusion of images.
  5. If required, change the proposed document name. This name will later be inherited into the MAXQDA project.
  6. If required, enter a text in the “Document Memo” box. This text will later be connected to the imported document as a document memo.
  7. Click Collect.

The MAXQDA Web Collector saves the web page in the default download folder of your browser. The web page is saved in MWEB format that was specially developed for a further processing in MAXQDA.

Please note: Some web pages cannot be simplified for technical reasons. In this case, a message appears in the Web Collector window.

Collecting parts of a web page

In case you are only interested in a certain part of a web page you can download only the currently seletion:

  1. Highlight the segment you are interested in with your mouse.
  2. Right-click the highlighted segment and choose Collect Selection for MAXQDA.

The MAXQDA Web Collector saves the selection in the default download folder of your browser. The selection is saved in MWEB format that was specially developed for a further processing in MAXQDA.

Segments that have been saved this way can be imported into MAXQDA as a text or PDF document.

Collecting PDF documents

To import PDF documents currently opened in Chrome follow these steps:

  1. Open the Web Collector by clicking on the MAXQDA icon.
  2. If required, change the proposed document name. This name will later be inherited into the MAXQDA project.
  3. If required, enter a text in the “Document Memo” box. This text will later be connected to the imported document as a document memo.
  4. Click Collect.

Importing collected web pages into MAXQDA

You can import a collected web page into your MAXQDA project by following these steps:

  1. Open the MAXQDA project you want to import the saved web pages in.
  2. Mark a document group in the “Document System” window by clicking on it in order to import the web page in a selected document group (if no document group is marked, MAXQDA will import the web pages into a new group)
  3. Select Import > Web Collector Data in the main menu.

The following dialog window will appear:

Dialog window for the import of documents that have been saved with the Web Collector
  1. If you open the dialog for the first time MAXQDA selects the standard folder for downloads and displays it at the top of the box. Every collected web page located in this folder will be listed in the dialog. By clicking the three dots … you can choose any other folder to import collected web pages from the MAXQDA Web Collector.
  2. Select every web page you want to import with the mouse. It will be highlighted green. By default, all listed web pages are selected).
Tip: Double click a row to open the downloaded file in the Internet browser. This allows you to check single web pages for their content.
  1. Import options are provided at the bottom of the dialog:
    • you can select whether web pages should be imported as PDF or image documents
    • you can select whether simplified web pages or selections should be imported as PDF or text documents.
    • The option Import new data only can be used to import only data that does not yet exist in the open MAXQDA project. If checked, MAXQDA compares the collect date, the document name and the document type (text, PDF, image).

In order to start the import, click Import selected files. The time for import can vary depending on the scope and format of the selected document. A progress bar informs you about the import progress of each document.

In case you selected PDF or image format and you want to import big files, MAXQDA asks you whether you want to import them as externally saved files instead of saving them in the project (see External Files for more details).

Please note: The main layout will be adopted in the import process. Deviations from the web page’s original layout might occur especially as far as complex web pages are concerned.

Direct import of web pages in HTML format

  1. You can also import an HTML file directly into MAXQDA. To do this, select Import > Texts, PDFs, Tables from the main menu. In the dialog box that appears, select the HTML file. Alternatively, you can drag and drop a HTML file into the “Document System” window with the mouse.
  2. MAXQDA imports the file as a text document and includes images, if they were embedded in the HTML file.
Please note: If you use the direct import of web pages in HTML format they are imported as text documents (and not as web pages). The layout can change a lot during the import, which is why using the Web Collector for this task is clearly preferable. The direct HTML import is particularly suitable for simply structured and designed data.