Unable to print names in the right way in another function

  • A+
Category:Languages

I've written a script in python to scrape all the names and the links associated with it from the landing page of a website using .get_links() function. Then I've created another function .get_info() to reach another page (using the links derived from the first function) in order to scrape phone numbers from there.

I didn't need to create the second function at all If my goal was to parse the two items from that webpage because they are already available in the landing page.

However, the way I would like my parser to behave is to print the names (carrying forward from the first function) within the second function along with the phone numbers there. Most importantly I do not want to kick out the for loop defined within the second function. If the for loop were not in the second function then the problem would not have arised. Without using for loop I can already get the desired output.

This is my script so far:

import requests from bs4 import BeautifulSoup from urllib.parse import urljoin  url = "https://potguide.com/alaska/marijuana-dispensaries/"  def get_links(link):     session = requests.Session()     session.headers['User-Agent'] = 'Mozilla/5.0'     r = session.get(link)     soup = BeautifulSoup(r.text,"lxml")     for items in soup.select("#StateStores .basic-listing"):         name = items.select_one("h4 a").text         namelink = urljoin(link,items.select_one("h4 a").get("href"))  ##making it a fully qualified url         get_info(session,name,namelink)          ##passing session in order to reuse it  def get_info(session,title,url):     r = session.get(url)     soup = BeautifulSoup(r.text,"lxml")     for items in soup.select("ul.list-unstyled"):  ##if I did not use for loop I could get the output as desired.         try:             phone = items.select_one("a[href^='tel:']").text         except:             phone = ""         print(title,phone)  if __name__ == '__main__':     get_links(url) 

The output I'm having:

AK Frost  AK Frost  AK Frost  AK Frost  AK Frost  AK Frost (907) 563-9333 AK Frost  AK Frost  AK Frost (907) 563-9333 AK Frost   AK Fuzzy Budz  AK Fuzzy Budz (907) 644-2838 AK Fuzzy Budz  AK Fuzzy Budz  AK Fuzzy Budz (907) 644-2838 

My expected output:

AK Frost (907) 563-9333 AK Fuzzy Budz (907) 644-2838 

 


If the goal is only to get the expected output, this should work:

def get_info(session,title,url):     r = session.get(url)     soup = BeautifulSoup(r.text,"lxml")     for items in soup.select("ul.list-unstyled"):         try:            phone = items.select_one("a[href^='tel:']").text         except:            # skip item and continue            continue           else:            # exception wasn't rised, you have the phone            print(title,phone)            break 

Comment

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: