MAXQDA

Can MAXQDA handle thousands of linked PDF files with images?

Can MAXQDA handle thousands of linked PDF files with images?

04.03.2019, 18:18

I want to build a system from which I can query coded segments from thousands of files on a given topic (code) at will. I'm hoping to use MAXQDA as a document & data management system to conduct an ongoing review of literature on many related topics at once for years to come. I see the potential for thousands of PDF files to be put into the system over time with no less than 100 codes. I want to be sure this idea is sustainable with MAXQDA. I'm worried that all the coded data will be in one file. I'll make backups on a regular basis, but still. Any advice would be helpful.

A few specific questions:
- Can MAXQDA handle thousands of PDFs (with images) in one file? Most files will likely be 50 pages or less.
- I'm hoping to link in all files from a Microsoft SharePoint cloud library. Does linking all files help MAXQDA work with more files before running into performance issues or corrupting? Would reducing files sizes by compression help?
- What issues do you foresee in building this system with MAXQDA? Should I pair MAXQDA with another software?

Version: MAXQDA 2018
System: Windows 10
Muse
 
Posts: 1
Joined: 04.03.2019, 17:49

Re: Can MAXQDA handle thousands of linked PDF files with ima

06.03.2019, 19:21

Dear Muse,
Thank you for your interest in MAXQDA and welcome to the MAXQDA forum! We are happy to have you. Let me go through your questions one by one:

- Can MAXQDA handle thousands of PDFs (with images) in one file? Most files will likely be 50 pages or less.

That's very difficult to say without knowing more about the PDFs. (Moreover: there is no need to put them all into one file and we don't recommend it, but I'll come to this in a bit.) If they are recently and professionally produced PDF files which you downloaded from a publisher, you might be able to import 5.000 of them without MAXQDA really sweating. Then again, you might have 100 PDFs that you scanned yourself in 1998 and which take ages to load (with any program). So the quality of the PDFs actually play a big role here; MAXQDA has a (pretty fast if I might say so) PDF-engine, the same that's in "Foxit Reader", but with some files, you'll get performance issues no matter which software you use.

But in general, the one and only limitation that might get relevant here is connected to RAM. If all the PDFs are loaded together, and this exceeds the maximum RAM capabilities of your PC, then you might get in trouble. However, we are currently testing a 64bit-version of MAXQDA which will hopefully be released soon (it's already very stable, and we already provide the very few users who actually reach the RAM limit with this version, and didn't get any complains so far.) With a 64bit version, you can load up to 256 or even 512GB of data into your RAM. This should be more then sufficient for anyone not bent on creating the next Google...

- I'm hoping to link in all files from a Microsoft SharePoint cloud library. Does linking all files help MAXQDA work with more files before running into performance issues or corrupting?

For your setup, I would indeed highly recommend to save all the PDF-files as externally stored documents. This means that all of your project files (you can then also duplicate/backup them much more easily of course) will look in the "Externals"-folder for any PDF-files which had been imported as documents before. This externals-folder can also be on a network or cloud drive, no problem at all. You can read all about external files and how to import PDF per default as external files here:

https://www.maxqda.com/help-max18/import/external-files

(Simply set "Do not embed files if larger then" to "0".)

> Would reducing files sizes by compression help?

I'm unfortunately no real expert in regard to PDFs, I've actually studied sociology.. but if it's anything like video or audio files, it might be the exact opposite: I know that with video files, the more intensive the compression is, the more work has to be done when decompressing them again. No idea if it's the same with PDFs though. I'll ask around.

- What issues do you foresee in building this system with MAXQDA? Should I pair MAXQDA with another software?

No issues I can think of.. The only issue we encounter from time to time at all (but this will hopefully be resolved soon) has to do with opening the project file not from a local hard drive. To elaborate:

One common cause for all kinds of issues results from not opening the project file from a local hard drive, but from a network resource, USB flash drive, external hard drive, or from a folder that's being kept synchronized with a cloud service. Since MAXQDA is a database program, all changes get saved automatically. If however the connection gets interrupted, this can in very rare cases lead to errors in the database structure. This is why we strongly recommend to always open the project file from a local hard drive. (You can save it anywhere you want of course, this is just about opening.)

But even with the very few people who encounter such an issue, we can restore their projects in over 95%.

In closing: As with any other software, you should create backups of your work. In this case: backups of your project file. This applies even more so if you should choose to ignore our warning and open project files directly from a USB flash drive or cloud folder. If you remember this one thing, there is not even a theoretical problem I could spot that would hinder your project to create a literature database with MAXQDA.

We hope this helps! If not, or in case of any other questions, please don't hesitate to contact us again.

Best regards on behalf of the MAXQDA support team,

Andreas
Andreas V.
 

Re: Can MAXQDA handle thousands of linked PDF files with ima

15.05.2020, 22:10

Just a follow-up to this question.
I successfully loaded ~50,000 abstracts, and in another project ~40,000 text documents of academic papers related to COVID-19.

What I discovered was that my limitation was my PC memory. Running a MAXDictio analysis of 2-5 word combinations, the process ran for 72hrs before I aborted, and had reached 70% done. The software was working fine, but my PC running at >90% CPU and 100% memory was slowly approaching infinity in time per record. As the PC started using disk for memory overlay, the processing speed dropped precipitously.

I intend to experiment using a 64 GB RAM dual processor Azure Cloud Virtual Machine with a similar dataset.
mloxton
 
Posts: 127
Joined: 28.02.2014, 19:38
Location: Washington DC & Denver Colorado

Return to MAXQDA in Research

Who is online

Users browsing this forum: No registered users and 13 guests

We use cookies to improve your experience on our website. By clicking OK or by continuing to browse the website, we’ll assume that you are happy with their use. Click here to review our Cookie Policy. OK