How can I get a Wikipedia article's text using Python 3 with Beautiful Soup?

  • A+
Category:Languages

I have this script made in Python 3:

response = simple_get("https://en.wikipedia.org/wiki/Mathematics") result = {} result["url"] = url if response is not None:     html = BeautifulSoup(response, 'html.parser')     title = html.select("#firstHeading")[0].text 

As you can see I can get the title from the article, but I cannot figure out how to get the text from "Mathematics (from Greek μά..." to the contents table...

 


select the <p> tag. There are 52 elements. Not sure if you want the whole thing, but you can iterate through those tags to store it as you may. I just chose to print each of them to show the output.

import bs4 import requests   response = requests.get("https://en.wikipedia.org/wiki/Mathematics")  if response is not None:     html = bs4.BeautifulSoup(response.text, 'html.parser')      title = html.select("#firstHeading")[0].text     paragraphs = html.select("p")     for para in paragraphs:         print (para.text)      # just grab the text up to contents as stated in question     intro = '/n'.join([ para.text for para in paragraphs[0:5]])     print (intro) 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: