If you are here, how the workflow works, there will be an upcoming note HEY-Screener in (Neo)Mutt, but for now, you can check all scripts on my
Mutt dotfiles.
This is the Screener HEY URL:
https://app.hey.com/my/clearances?page=3 we want to scrape from. The tag to grab is screened-person--denied and screened-person--approved.
This is the second option I created after the Console one didn’t scale and only for one page.
Then I tried to find an open API (see further below how you can find the API). As I found one, I used Python to loop through all pages with the “older” button, and then do the same again:
importrequestsfrombs4importBeautifulSoupimportosdefscrape_emails(url,cookies):page=1denied_emails=[]approved_emails=[]withrequests.Session()assession:whileTrue:response=session.get(url,params={"page":page},cookies=cookies)soup=BeautifulSoup(response.text,"html.parser")# Extract emailsforelementinsoup.select(".screened-person--denied"):email=element.select_one(".screened-person__details span")ifemail:denied_emails.append(email.get_text(strip=True))forelementinsoup.select(".screened-person--approved"):email=element.select_one(".screened-person__details span")ifemail:approved_emails.append(email.get_text(strip=True))# Check for the 'Older' button/linknext_page_link=soup.select_one('a.paginator__next[href*="/my/clearances?page="]')ifnotnext_page_link:break# No more pagespage+=1# if page == 3:# breakreturndenied_emails,approved_emailsdefwrite_to_file(filename,email_list):withopen(filename,"w")asfile:foremailinemail_list:file.write(f"{email}\n")cookies={# Set ENV variable with hey cookie. Load the screener and search in network tab for `https://app.hey.com/my/clearances?page=` request.# There you see the cookies used. Might need to change after re-login"_csrf_token":os.getenv("HEY_COOKIE"),}url="https://app.hey.com/my/clearances"denied_emails,approved_emails=scrape_emails(url,cookies)# Write the lists to fileswrite_to_file("denied_emails.txt",denied_emails)write_to_file("approved_emails.txt",approved_emails)print("Denied Emails:",denied_emails)print("Approved Emails:",approved_emails)
Make sure to set the ENV cookie. You can achieve that by loading the screener and searching in the network tab for https://app.hey.com/my/clearances?page= request.
There you see the cookies used. Might need to change after re-login.
See below: