In this post, we will show a simple Python script using Selenium and ChromeDriver to connect to CNBC.com and print out all headlines on the front page.
Before the code can talk with your Google Chrome web browser, you must download and install the proper ChromeDriver.exe file. See this post for instructions on how to download and install the proper ChromeDriver for the version of Google Chrome you have.
# Python code generated by AutomateBard.com from selenium import webdriver from selenium.webdriver.common.by import By # Main Script if __name__ == '__main__': try: print("Connecting to ChromeDriver") driver = webdriver.Chrome('./chromedriver') driver.implicitly_wait(1.0) print("Connecting to CNBC") driver.get("https://www.cnbc.com") # Find all news headlines headlines = driver.find_elements(By.CLASS_NAME, "RiverHeadline-headline") # Print all headlines for headline in headlines: print(headline.text) except Exception as e: # DO NOT DO THIS. Use proper exception handling! print(e) finally: print("Closing ChromeDriver") driver.close()
How The Code Works
The code first connects to www.CNBC.com. Then, it searches the site for all CSS elements that have the class “RiverHeadline-headline”. NOTE: Be sure to use .find_elements vs .find_element if you want to capture more than one item. Because .find_elements returns an iterator, we can use a for loop to print out each headline.
The program returned the following headlines on April 20, 2023 on CNBC.com:
- MyPillow CEO Mike Lindell ordered to pay $5 million to man who debunked election-fraud claim
- Small caps will be large this year, says Jefferies. Here are 10 buy ideas
- DOJ charges 18 people — including doctors — in massive Covid health-care fraud takedowns
- Read the internal memo Alphabet sent in merging A.I.-focused groups DeepMind and Google Brain
- Alec Baldwin lawyers say manslaughter charges to be dropped in ‘Rust’ movie set shooting
- Kyiv says it’s time for NATO to invite Ukraine into the alliance — not just to a summit
- Ford F-150 Lightning fire footage highlights a growing EV risk
- Kendall’s $29 million ‘Succession’ home: We went to ‘great lengths to create something truly unique’
- Taylor Swift sidestepped FTX lawsuit by asking a simple question
- Wells Fargo says this regional bank stock that got caught up in crisis should rebound by 60%
- Coinbase secures Bermuda license, and EU approves framework for crypto regulation: CNBC Crypto World
- BuzzFeed will lay off 15% of staff, shutter its news unit
- Senate invites Supreme Court Chief Justice Roberts to testify after Clarence Thomas ethics scandal
- Savings account interest rates just hit a 15-year high, but fewer Americans are benefitting
- This free online paycheck withholdings tool may help you avoid a tax bill for 2023, IRS says
- Nvidia, Microsoft and more: CNBC’s ‘Halftime Report’ traders answer your questions
- Meta and Disney begin cutting jobs. Here’s a rundown of all Club names planning major layoffs
- AT&T shares sink after company posts softer than expected revenue, cash flow
- 10 financial lessons to learn as your priorities shift in your 20s, 30s, 40s and 50s
- Tesla shares fall on earnings drop
Each website may have a different way to classify their headlines in CSS. For example, while CNCB the class name “RiverHeadline-headline”, MarketWatch.com uses the class name “article__headline” for theirs. Google seems to use the “h4” tag on its new.google.com site.
For Google’s News Use:
headlines = driver.find_elements(By.TAG_NAME, "h4")