Web Pages

There are several ways in which you can import a web page into MAXQDA:

  1. Using the MAXQDA Web Collector browser extension, you can save single web pages in the Google Chrome browser for import into MAXQDA.
  2. To import several web pages at once, you can import an Excel list containing links to web pages. This option is also suitable for hundreds of web pages and more.
  3. HTML files can be dragged directly into the MAXQDA project with the mouse, just like text documents.
  4. A web page can be imported via URL directly in MAXQDA.

Option 1) Import web pages via Web Collector

With the MAXQDA Web Collector, you can save web pages, parts of web pages, or PDF files in your browser and then import them into MAXQDA in different formats as a PDF, image, or text. Additionally, you can directly import web pages in HTML format as MAXQDA text documents.

Saving web pages with the MAXQDA Web Collector

The MAXQDA Web Collector is an extension for the Internet browser "Google Chrome". It allows you to save entire web pages and import the files as PDF, image, or text documents into MAXQDA. Among others, this tool is useful to compare websites from various organizations, or to collect content from web pages for analysis with MAXQDA.

Installing the MAXQDA Web Collector

In order to work with the MAXQDA Web Collector, the installation of the Internet browser "Google Chrome" is required. As soon as "Chrome" is installed on your computer you can start the installation of the "MAXQDA Web Collector":

  1. Open "Google Chrome".
  2. Search the Chrome Web Store for "MAXQDA" or use the following link:
    https://chrome.google.com/webstore/detail/web-collector-for-maxqda/jhnochbooihpgjbgcjlpihaefoehlakd
  3. In order to add the extension to your browser click „+ Add“.
  4. Click the “puzzle piece” icon in Google Chrome and click the pin icon to display the installed Web Collector in the address bar.
Pin Web Collector extension symbol to Google Chrome’s address bar

After a successful installation you can see a tiny MAXQDA icon on the top-right in your browser window:

MAXQDA icon to open the MAXQDA Web Collector in „Google Chrome“

If you want to use the Web Collector, just click on the MAXQDA icon:

MAXQDA Web Collector

The Web Collector offers four modes:

  • Entire web page: the layout is preserved in the best possible way.
  • Simplified web page: the web page will be reduced to key text and images as in using a read mode.
  • Selections on web pages: only the selected part of the web page is collected.
  • PDF documents: all PDF files currently open in Chrome are collected.

Collecting the entire web page

If your research is based on the analysis of every visible item of the web page it is recommended to save the entire web page in order to import it to your MAXQDA project as true to the original as possible:

  1. Open the web page you want to save in Google Chrome.
  2. When the web page is fully loaded, open the Web Collector by clicking the tiny MAXQDA icon.
  3. Make sure the tab "Web Page" is open.
  4. If required, change the proposed document name. This name will later be inherited into the MAXQDA project.
  5. If required, enter a text memo in the “Document Memo” box. This text will later be connected to the imported document as a document memo.
  6. Click Collect.

The MAXQDA Web Collector saves the web page in the default download folder of your browser. The web page is saved in MWEB format, which was especially developed for further processing in MAXQDA.

A web page saved this way can be imported into MAXQDA as a PDF or image document.

Collecting simplified web pages

In case your analysis focuses on the text of a web page you can save it as a "simplified" web page. The web page will be reduced to main text and images, which can be compared to the read mode of your smartphone. This is particularly useful for big newspapers and magazines. In order to import a “simplified” web page, proceed as described below:

  1. Open the web page you want to save in Google Chrome.
  2. When the web page is fully loaded open the Web Collector by clicking the tiny MAXQDA icon.
  3. Click on the "Simplified Web Page" tab. The web page will then be reduced to its main parts.
  4. If required, adjust further options such as font type, font size, and exclusion of images.
  5. If required, change the proposed document name. This name will later be inherited into the MAXQDA project.
  6. If required, enter a text memo in the “Document Memo” box. This text will later be connected to the imported document as a document memo.
  7. Click Collect.

The MAXQDA Web Collector saves the web page in the default download folder of your browser. The web page is saved in MWEB format, which was especially developed for a further processing in MAXQDA.

Please note: Some web pages cannot be simplified for technical reasons. In this case, a message appears in the Web Collector window.

Collecting parts of a web page

In case you are only interested in a certain part of a web page you can download only the current selection:

  1. Select the segment you are interested in with your mouse.
  2. Right-click the highlighted segment and choose Collect Selection for MAXQDA.

The MAXQDA Web Collector saves the selected content in the default download folder of your browser. The selection is saved in MWEB format which was especially developed for a further processing in MAXQDA.

Segments that have been saved this way can be imported into MAXQDA as a text or PDF document.

Collecting PDF documents

To import PDF documents currently opened in Chrome follow these steps:

  1. Open the Web Collector by clicking on the MAXQDA icon.
  2. If required, change the proposed document name. This name will later be inherited into the MAXQDA project.
  3. If required, enter a text memo in the “Document Memo” box. This text will later be connected to the imported document as a document memo.
  4. Click Collect.

Importing collected web pages into MAXQDA

You can import a collected web page into your MAXQDA project by following these steps:

  1. Open the MAXQDA project into which you want to import the saved web pages.
  2. Mark a document group in the "Document System" window by clicking on it in order to import the web page in a selected document group (if no document group is marked, MAXQDA will import the web pages into a new group).
  3. Select Import > Web Pages > Web Pages from Web Collector Data in the main menu.

The following dialog window will appear:

Dialog window for the import of documents that have been saved with the Web Collector
  1. If you open the dialog for the first time MAXQDA selects the standard folder for downloads and displays it at the top of the box. Every collected web page located in this folder will be listed in the dialog. By clicking the three dots … you can choose any other folder to import collected web pages from the MAXQDA Web Collector.
  2. Select every web page you want to import with the mouse. Selected pages will be highlighted green. By default, all listed web pages are selected).
Tip: Double click a row to open the downloaded file in the Internet browser. This allows you to check individual web pages for their content.
  1. Import options are provided at the bottom of the dialog:
    • you can select whether web pages should be imported as PDF or image documents
    • you can select whether simplified web pages or selections should be imported as PDF or text documents.
    • The option Import new data only can be used to import only data that does not yet exist in the open MAXQDA project. If checked, MAXQDA compares the collect date, the document name, and the document type (text, PDF, image).

In order to start the import, click Import selected files. The time for import can vary depending on the scope and format of the selected document. A progress bar informs you about the import progress of each document.

In case you selected PDF or image format and you want to import large files, MAXQDA asks you whether you want to import them as externally saved files instead of saving them in the project (see External Files for more details).

Please note: The main layout will be adopted in the import process. Deviations from the web page’s original layout might occur, especially as far as complex web pages are concerned.

Option 2) Bulk import of multiple web pages via Excel file containing links

If you have a collection of links, you can compile them in an Excel spreadsheet. MAXQDA reads the links from the Excel spreadsheet and imports the web pages as simplified web pages, PDF documents or images. This way, you can also import a hundred web pages and more at once.

Structure of the Excel table

The Excel table must contain the links to the web pages in one column. The other columns can contain information about each web page, which MAXQDA will import as document variables and into the document memo. The order of the columns determines in which order the data is imported.

For example, a table for the import looks like this:

URLLanguageNote
https://www.maxqda.comENInformation in English
https://www.maxqda.com/maxdaysENInformation about the conference
https://www.maxqda.com/deDEInformation in German

Importing the Excel file

To start the import, select Import > Web Pages > Web Pages via Links from Excel File. After selecting an Excel file in the file dialog, the following option dialog appears:

Dialog for importing multiple web pages via Excel list

Make the following settings in the dialog:

  • At the very top, select which column contains the links to the web pages. If you have named this column as “URL”, MAXQDA will detect it automatically.
  • In the center, select which information should be included in the document memo and which should be included as a document variable. You can also check both options to include the information both as a document variable and in the document memo.
  • At the bottom, you select how the web pages should be imported. The options correspond to those of importing web pages via Web Collector:
    Text (simplified web page) – By using this setting, only the main text is taken from the web page (without images). This option works like a reading mode that you can turn on for optimized reading of web pages on your cell phone and is recommended for several reasons: you can focus on the main content of a web page and most pop-up banners like cookie requests are ignored. The setting is especially suitable for news sites.
    PDF – The web page is converted to a PDF, with most of the layout preserved. If you select this option, you can specify below that the PDFs should be saved in the external files folder rather than in the project. You can try importing with the Switch off JavaScript option turned on to ignore annoying pop-up banners. However, this will result in some web pages not being displayed at all or not being displayed correctly.
    Image – The web page will be converted to an image, keeping most of the layout. It is also possible to disable JavaScript for this import type.

After clicking OK, another dialog appears in which you can specify the variable types.

MAXQDA creates a new document group for the imported web pages and uses the title of the web page as the document name.

In the memo of the document group a report is stored, which also lists all web pages, which could not be imported, because the web page was not accessible or because another error occurred during import.

Option 3) Direct import of web pages in HTML format

  1. You can also import an HTML file directly into MAXQDA. To do this, select Import > Texts, PDFs, Tables from the main menu. In the dialog box that appears, select the HTML file. Alternatively, you can drag and drop a HTML file into the "Document System" window with the mouse.
  2. MAXQDA imports the file as a text document and includes images if they were embedded in the HTML file.
Please note: If you use the direct import of web pages in HTML format they are imported as text documents (and not as web pages). The layout can change a lot during the import, which is why using the Web Collector for this task is clearly preferable. Direct HTML import is particularly suitable for simply structured and designed data.

Option 4) Direct import of a web page via URL

Web pages can be imported directly in MAXQDA via Import > Web pages > Web Pages via URL by simply inserting the link to a web page. In the import dialog, you can choose to import the web page as text, PDF, or image.

Dialog for importing a web page via URL

Was this article helpful?