This is a guest article written by Matthew Loxton, a Principal Healthcare Analyst and Professional MAXQDA Trainer.
For a researcher, there are many non-commercial YouTube videos that would be very good sources of data. However, it is often not possible to just download the video as a media file to use within MAXQDA. Additionally, in many YouTube videos, the transcriptions are automatically generated by Google Machine Learning, and cannot be automatically downloaded by MAXQDA.
In a previous blog in 2019, I described how using MAXQDA, you can directly import YouTube comments and manually created transcriptions, but cannot directly import the video file or transcripts created automatically by YouTube. This post describes how that functionality still exists, and that you may still obtain the media file behind the YouTube video, and in many cases still pull down a transcript. This post walks you through a way I used to acquire and download a YouTube video file (MP4), and import transcriptions created automatically by YouTube.
Hurdles with YouTube: The Media Problem
There are two main ways to acquire the video media:
- Ask the author or channel owner for the media file;
- access the video directly on the YouTube server location and save it to your PC.
Acquiring the Media File (Ask the Owner)
I had some success simply asking the authors or the YouTube Channel owner for a copy of their video. In approximately half of the times I asked, they were willing to give me the original media file. The chief obstacles after establishing contact were that the media files were too large to be simply emailed, and required additional steps of the author placing them on a sharing site such as DropBox or Google Drive. Each added complication resulted in a steep drop-off of success.
Acquiring the Media File (App)
There are many commercial off the shelf tools and applications that can be used to download YouTube videos, and I will cover one free open-source application. The VideoLAN non-profit describes its VLC product as a “free and open source cross-platform multimedia player and framework that plays most multimedia files, and various streaming protocols.” I downloaded Version 3.0.16 for Windows for this blog.
After downloading and installing VLC, open the YouTube browser for the target media (See Figure 1) and copy the URL (1).
Open VLC and click on “Media”, select “Open Network Stream” and paste the full YouTube URL into the network textbox (Figure 2 Item 2), and click “Play” (3).
Once you have verified that the correct YouTube video is playing, click on “Tools”, and select “Codec Information” from the VLC dropdown. In the resulting Media Information screen (Figure 3) verify that you are in the Codec panel (1), and then copy the long URL from the “Location” textbox. This is the full path to the YouTube media.
Paste this long URL into your browser and once the video begins to play, right-click on the video and select “Save video as”. Save the media file to a memorable location. Once it has downloaded, you may wish to double-click it to verify that the MP4 file plays and is the video you wanted.
Note: I was frankly surprised that this still worked since my first blog in 2019, and had anticipated that YouTube would at some point cut off this ability. Nevertheless, for the time being, this functionality still exists and worked at the time of writing.
The Transcription Hurdle
If the YouTube video has a manually-created transcription, you will see something like the image shown in Figure 4 below. The import of YouTube data is fairly simple and uses the MAXQDA YouTube Data function available in the Import section. You can see more details on this in the excellent 2020 Spotlight session by Cate Fugazzola.
You can do this via the following steps:
- Navigate to the YouTube video with a browser
- Check that a transcription exists
- Select the URL and copy it
- In MAXQDA, click on Import
- Click YouTube Data
- Observe that correct url has populated the URL field in the import popup or paste it in
- Observe selection for importing comments
In the MAXQDA manual, the image of the dialog box shows a section for Import transcript/closed caption (1) and an option to Link existing video file to transcript (2)
However, if the transcription was generated automatically by YouTube, you will see a slightly shorter screen, with no option to download the transcription or link to media.
For example, the YouTube video I wanted had an automatically generated transcription file, and following the same steps instead yields Figure 5.
To get past this obstacle and have a linked transcription and media file in MAXQDA, you can manually acquire the media file (see above) and transcription with timestamps, and import and link them in MAXQDA.
Acquiring the Transcript
To download the automated YouTube transcript, you can manually copy it and edit the timestamps in a way that it can be imported into MAXQDA that allows it to be linked to the media.
- In the YouTube browser screen, open up the automated transcript by clicking on the “…” (Figure 6 Item 1) icon under the viewing pane, and select Open Transcript (2).
- In the transcript panel (3) select the first few characters, scroll to the bottom, shift-click at the end to select the entire transcript and copy.
a. If the timestamps are not visible, click the “…” icon and select Toggle Timestamps
b. I have found this manual selection method works best because using a “select all“ shortcut like Ctrl-A will harvest things like page headers and footers that you do not want.
- Open Microsoft Excel or other spreadsheet.
- Select the first column, and format as text (if you don’t do this, it may interpret the timestamps and complicate the next steps).
- Paste the transcription into the first column.
- You may see something like the following, with text and the timestamps in subsequent rows.
we are very pleased to present you with
an inspiring locally made story about
the development of an advanced brain
Note: If you import this, MAXQDA will not yet be able to recognize the timestamps, so these need to be formatted first.
- To format the timestamps, I used the following Excel equation in column 2: =IF(ISNUMBER(VALUE(LEFT(A1,1))),”#00:”&A1&”#”,A1)
What this does is look at the cell in col 1, evaluate if the value of the first character is numeric, and if it is, it adds a hash and a 0 to the front of the timestamp, and adds a hash to the end. If the first character isn’t numeric, we assume it is the transcription text, and just copy it into col 2.
- Select col 2.
- Open a word document (or any word processing app), and paste the column as plain text.
- Save the document in a memorable location.
- In MAXQDA, click on Import > Transcripts with Timestamps, and select the saved Word document.
- When the dialogue box prompts you, select the YouTube media file, and browse to the file.
- MAXQDA will import the transcript and you will see a popup dialog box inviting you to link the media file.
- Navigate to the media file you saved above, and press OK.
Although there are legal and practical barriers to automatically downloading YouTube media and machine-generated transcripts, there are somewhat straightforward manual methods that can enable the researcher to achieve the same thing. These methods may not age well if YouTube or VLC change their product features, but in the interim, they can be a practical means of having transcripts that are linked to video media for analysis.
Additionally, have a look at these helpful entries in the MAXQDA Manual:
MAXQDA Blog entries on research projects utilizing Youtube data import:
About the Author
Editor’s note: this post has been updated from its original version published in June 2019.