Convert Wikipedia pages to Word file with Python

We use Python for pretty much everything, let it be data science, web development, desktop applications and even web scraping. We use Python because its syntax is easy to work with.
There are plenty of libraries in Python which makes coding with Python even simpler than it already is. Most of these libraries come with a proper documentation on how to use them. This saves us plenty of time while building a program.
Here, I want to show you how to build a program that takes in a Wikipedia page link and convert the text inside it into a word document file (.docx).

Libraries used
Here we use only these 3 libraries to create this program.
- Beautiful Soup
- Requests
- Docx
Let’s dive into the code,
Our first task is to import the libraries we mentioned above:

Now we can use the get() method in requests library to get the specified link. Here, we’ll be using the Python programming page on Wikipedia.

Now lets extract the source code from this page. We will be using Beautiful Soup library for this task.

Before we fetch all the paragraphs in the Wikipedia page, let’s fetch its heading. We can do this task by find the class or id of the heading in the Wikipedia page. Use inspect element in your browser to do so.

Now lets use beautiful soup’s find method to fetch this text.

Similarly, let’s find all the paragraphs inside the page. We can simply fetch all the text inside <p> tags in the page. We can remove any newline characters present in the text with the help of list comprehension in Python.

We can finally create a docx Document object and add our scraped title to it.

We can use a small for loop to add all the paragraphs to the document object.

Done, all there is left is to save it as a word document (.docx) file. It can be done with a simple save() method in docx.

There you go. All done. All we needed was 13 lines of Python code.
We can make this a little more convenient by making this a Python function. Here is the entire code:

Before you go
I hope you enjoyed reading this article and find it useful. This is my first article on medium and hope to write a lot more in the future. Please consider following me on | GitHub | Linkedin | Kaggle |
Vishnu Viswanath