News Sentiment Analysis

July 14, 2023 7 minute read

News Sentiment Analysis

In this article, I will conduct an in-depth analysis of an interactive stock sentiment treemap for the S&P 500.

The primary objective is to comprehend the prevailing public sentiment for each individual stock.

Let’s delve deeper into understanding what the collective opinion implies for each of these stocks.

You are able to see this treemap in this link

png

Upon analysis, it appears that the majority of stocks within the S&P 500 are associataed with positive news sentiments.

Utilizing python to analyze

Importing libraries and data

The methodology involved parsing stock data from Yahoo Finance, conducting data manipulation using the pandas library and visualization with matplotlib. The final stage incorporated the use of Natural Language Toolkit(NLTK) for sentiment analysis through machine learning techniques.

# libraries for webscraping, parsing and getting stock data
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
import yfinance as yf

# for plotting and data manipulation
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import plotly
import plotly.express as px

# NLTK VADER for sentiment analysis
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

import requests as r
import io

data_link = 'https://www.dropbox.com/s/6kh2u0qotscww5n/component.csv?raw=1'

response = r.get(data_link)
content = io.StringIO(response.text)
data = pd.read_csv(content)

stocks = data['Symbol']
sector = data['Sector']

tickers = stocks
number_of_shares = tickers_dict.values()

import time
##### Scrape the Date, Time and News Headlines Data
finwiz_url = 'https://finviz.com/quote.ashx?t='
news_tables = {}

for ticker in tickers:
    # print(ticker)
    url = finwiz_url + ticker
    req = Request(url=url,headers={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0'}) 
    response = urlopen(req)    
    # Read the contents of the file into 'html'
    html = BeautifulSoup(response)
    # Find 'news-table' in the Soup and load it into 'news_table'
    news_table = html.find(id='news-table')
    # Add the table to our dictionary
    news_tables[ticker] = news_table

    time.sleep(0.5)
    
news_table    

<table border="0" cellpadding="1" cellspacing="0" class="fullview-news-outer" id="news-table" width="100%">
<tr class="cursor-pointer" onclick="trackAndOpenNews(event, 'Motley Fool', 'https://finance.yahoo.com/m/d43d6e0d-bc30-3edf-a242-54ff19ab00cd/exxonmobil-is-spending-%244.9.html');">
<td align="right" width="130">
            Jul-14-23 07:15AM
        </td>
<td align="left">
<div class="news-link-container">
<div class="news-link-left">
<a class="tab-link-news" href="https://finance.yahoo.com/m/d43d6e0d-bc30-3edf-a242-54ff19ab00cd/exxonmobil-is-spending-%244.9.html" rel="nofollow" target="_blank">ExxonMobil Is Spending $4.9 Billion to Accelerate Its Lower-Carbon Energy Ambitions</a>
</div>
<div class="news-link-right">
<span>(Motley Fool)</span></div></div></td></tr>
<tr class="cursor-pointer" onclick="trackAndOpenNews(event, 'The Wall Street Journal', 'https://finance.yahoo.com/m/232999ff-2ca0-3cc5-b59d-fac8c75cc10b/exxon-buys-pipeline-operator%2C.html');">
<td align="right" width="130">
            Jul-13-23 09:18PM
        </td>
<td align="left">
<div class="news-link-container">
<div class="news-link-left">
<a class="tab-link-news" href="https://finance.yahoo.com/m/232999ff-2ca0-3cc5-b59d-fac8c75cc10b/exxon-buys-pipeline-operator%2C.html" rel="nofollow" target="_blank">Exxon Buys Pipeline Operator, Making Big Bet on Carbon</a>
</div>
<div class="news-link-right">
<span>(The Wall Street Journal)</span></div></div></td></tr>
<tr class="cursor-pointer" onclick="trackAndOpenNews(event, 'Investopedia', 'https://finance.yahoo.com/m/9865fa15-e7f0-3f3e-b4ee-55f258aa64c8/exxon-acquires-denbury-for.html');">
<td align="right" width="130">
            04:30PM
        </td> ...

##### Parse the Date, Time and News Headlines into a Python List
parsed_news = []
# Iterate through the news
for file_name, news_table in news_tables.items():
    # Iterate through all tr tags in 'news_table'
    for x in news_table.findAll('tr'):
        # Check if the 'a' tag exists within the 'tr' tag
        if x.a is not None:
            # Read the text from the 'a' tag
            text = x.a.get_text()
            date_scrape = x.td.text.split()
            # if the length of 'date_scrape' is 1, load 'time' as the only element
            if len(date_scrape) == 1:
                time = date_scrape[0]
                
            # else load 'date' as the 1st element and 'time' as the second    
            else:
                date = date_scrape[0]
                time = date_scrape[1]
            # Extract the ticker from the file name, get the string up to the 1st '_'  
            ticker = file_name.split('_')[0]
            
            # Append ticker, date, time and headline as a list to the 'parsed_news' list
            parsed_news.append([ticker, date, time, text])

        else:
            # Handle cases where the 'a' tag is not found as desired
            text = "Not found"  # Or any other value or action depending on your needs
       
parsed_news

[['AAPL',
  'Jul-14-23',
  '08:00AM',
  'Warren Buffetts Favorite Dividend Stocks  Should You Invest?'],
 ['AAPL',
  'Jul-14-23',
  '07:30AM',
  '2 No-Brainer Growth Stocks Up 46% and 63% to Buy Before the Next Bull Market'],
 ['AAPL',
  'Jul-14-23',
  '06:20AM',
  'Amazon has become one the most boring stories in tech: Morning Brief'],
 ['AAPL',
  'Jul-14-23',
  '06:05AM',
  '2 Under-the-Radar Gaming Stocks You Can Buy and Hold for the Next Decade'],
 ['AAPL',
  'Jul-14-23',
  '05:50AM',
  '3 Best Buffett Stocks to Buy for the Long Haul'],
 ['AAPL',
  'Jul-14-23',
  '05:35AM',
  'Got $3,000? 2 Tech Stocks to Buy and Hold for the Long Term'],
 ['AAPL',
  'Jul-14-23',
  '05:26AM',
  'Is This Top Tech Stock About to Become the Next $3 Trillion Company?'],
 ['AAPL', 'Jul-13-23', '07:14PM', 'The 2023 Unhedged stock draft'],
 ['AAPL',
  'Jul-13-23',
  '05:55PM',
  '3 Stocks to Buy for Artificial Intelligence (AI) Exposure'],
 ['AAPL',
  'Jul-13-23',
  '04:29PM',
  "Which Companies Could Buy a Stake in Disney's ESPN?"],
 ['AAPL',
  'Jul-13-23',
  '04:15PM',
  "Hollywood actors strike could be a 'near-term benefit' for streamers: Analyst"],
 ['AAPL',
  'Jul-13-23',
  '01:34PM',
  'Apple Defies Decline As iPhone Market Share Surges Amid Smartphone Slump'],
 ['AAPL',
  'Jul-13-23',
  '11:08AM',
  "Apple's (AAPL) Streaming Service Receives 54 Emmy Nominations"],
 ['AAPL',
  'Jul-13-23',
  '10:31AM',
  'Apples British sales bounce back to record £1.5bn after Covid slump'],
 ['AAPL', 'Jul-13-23', '09:45AM', '2 Stocks to Invest in Virtual Reality'],   ...

##### Perform Sentiment Analysis with Vader
# Instantiate the sentiment intensity analyzer
vader = SentimentIntensityAnalyzer()
# Set column names
columns = ['ticker', 'date', 'time', 'headline']
# Convert the parsed_news list into a DataFrame called 'parsed_and_scored_news'
parsed_and_scored_news = pd.DataFrame(parsed_news, columns=columns)

# Iterate through the headlines and get the polarity scores using vader
scores = parsed_and_scored_news['headline'].apply(vader.polarity_scores).tolist()
# Convert the 'scores' list of dicts into a DataFrame
scores_df = pd.DataFrame(scores)

# Join the DataFrames of the news and the list of dicts
parsed_and_scored_news = parsed_and_scored_news.join(scores_df, rsuffix='_right')
# Convert the date column from string to datetime
parsed_and_scored_news['date'] = pd.to_datetime(parsed_and_scored_news.date).dt.date
parsed_and_scored_news.tail()

Sentiment data

The process involves populating a DataFrame with Parsed news data. Each piece of news is subsequently analyzed to determine whether the sentiment is positive, negative or neutral. A compound score is then computed, which provides an aggregated measure of the sentiment trend.

	ticker	date	time	headline	neg	neu	pos	compound
9995	XOM	2023-06-16	07:40AM	Oil faces a 'serious problem' by 2024 as produ...	0.197	0.803	0.00	-0.5267
9996	XOM	2023-06-16	07:08AM	Inside the race to remake lithium extraction f...	0.000	1.000	0.00	0.0000
9997	XOM	2023-06-16	07:00AM	Investors in Exxon Mobil (NYSE:XOM) have seen ...	0.000	0.820	0.18	0.5106
9998	XOM	2023-06-16	07:00AM	INSIGHT-Inside the race to remake lithium extr...	0.000	1.000	0.00	0.0000
9999	XOM	2023-06-15	07:09PM	Tales of Stock Compounding: Does It Always Work?	0.000	1.000	0.00	0.0000

# Group by each ticker and get the mean of all sentiment scores
mean_scores = parsed_and_scored_news.groupby(['ticker']).mean()
mean_scores

C:\Users\Admin\AppData\Local\Temp\ipykernel_16264\2723772882.py:2: FutureWarning:

The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.

	neg	neu	pos	compound
ticker
AAPL	0.03496	0.84100	0.12402	0.134145
ABBV	0.03245	0.84974	0.11781	0.135030
ABT	0.04561	0.86803	0.08636	0.061686
ACN	0.02564	0.86268	0.11168	0.143406
ADBE	0.04316	0.87446	0.08239	0.052778
...	...	...	...	...
VZ	0.02482	0.86735	0.10782	0.141956
WBA	0.09430	0.81435	0.09136	0.019264
WFC	0.06189	0.75951	0.17859	0.161680
WMT	0.03938	0.80769	0.15293	0.152200
XOM	0.04880	0.82672	0.12449	0.118250

100 rows × 4 columns

import numpy as np

# Parsing each stocks' price.
industries = []
sectors = []
prices = []
for ticker in tickers:
    tickerdata = yf.Ticker(ticker)
    try:
        prices.append(tickerdata.info['currentPrice'])
    except KeyError:
        prices.append(np.nan)

    try:
        sectors.append(tickerdata.info['sector'])
    except KeyError:
        sectors.append(np.nan)

    try:
        industries.append(tickerdata.info['industry'])
    except KeyError:
        industries.append(np.nan)

              

# dictionary {'column name': list of values for column} to be converted to dataframe
d = {'Sector': sectors, 'Industry': industries, 'Price': prices}
# create dataframe
df_info = pd.DataFrame(data=d, index=tickers)
df_info

	Sector	Industry	Price
Symbol
AAPL	Technology	Consumer Electronics	190.320
ABBV	Healthcare	Drug Manufacturers—General	133.835
ABT	Healthcare	Medical Devices	106.810
ACN	Technology	Information Technology Services	315.150
ADBE	Technology	Software—Infrastructure	512.940
...	...	...	...
VZ	Communication Services	Telecom Services	34.565
WBA	Healthcare	Pharmaceutical Retailers	29.890
WFC	Financial Services	Banks—Diversified	43.455
WMT	Consumer Defensive	Discount Stores	154.430
XOM	Energy	Oil & Gas Integrated	104.000

100 rows × 3 columns

df = mean_scores.join(df_info)
df = df.rename(columns={"compound": "Sentiment Score", "neg": "Negative", "neu": "Neutral", "pos": "Positive"})
df = df.reset_index()
df

Collect data to dataframe

	ticker	Negative	Neutral	Positive	Sentiment Score	Sector	Industry	Price
0	AAPL	0.04388	0.84774	0.10836	0.086201	Technology	Consumer Electronics	190.320
1	ABBV	0.03000	0.84824	0.12176	0.146174	Healthcare	Drug Manufacturers—General	133.835
2	ABT	0.04847	0.86294	0.08859	0.061468	Healthcare	Medical Devices	106.810
3	ACN	0.02564	0.86268	0.11168	0.143406	Technology	Information Technology Services	315.150
4	ADBE	0.04145	0.87230	0.08625	0.064446	Technology	Software—Infrastructure	512.940
...	...	...	...	...	...	...	...	...
95	VZ	0.02637	0.87088	0.10275	0.125678	Communication Services	Telecom Services	34.565
96	WBA	0.09430	0.81435	0.09136	0.019264	Healthcare	Pharmaceutical Retailers	29.890
97	WFC	0.07355	0.77450	0.15193	0.104411	Financial Services	Banks—Diversified	43.455
98	WMT	0.04117	0.80796	0.15086	0.147280	Consumer Defensive	Discount Stores	154.430
99	XOM	0.05038	0.80983	0.13980	0.137952	Energy	Oil & Gas Integrated	104.000

100 rows × 8 columns

Treemap

fig = px.treemap(df, path=[px.Constant("Sectors"), 'Sector', 'Industry', 'ticker'],
                  color='Sentiment Score', hover_data=['Price', 'Negative', 'Neutral', 'Positive', 'Sentiment Score'],
                  color_continuous_scale=['#FF0000', "#000000", '#00FF00'],
                  color_continuous_midpoint=0)

fig.data[0].customdata = df[['Price', 'Negative', 'Neutral', 'Positive', 'Sentiment Score']].round(3) # round to 3 decimal places
fig.data[0].texttemplate = "%{label}<br>%{customdata[4]}"

fig.update_traces(textposition="middle center")
fig.update_layout(margin = dict(t=30, l=10, r=10, b=10), font_size=20)

plotly.offline.plot(fig, filename='stock_sentiment.html') # this writes the plot into a html file and opens it

Conclusion

As previously mentioned, you can refer to the hyperlink above to review the sentiment analysis conducted for each stock within the S&P 500. The majority of the anaylzed news sentiments lean towards positivity. This can aid in understanding market trends. However, it is important to note that that this could also lead investors to make misguided decisions.

News Sentiment Analysis

News Sentiment Analysis

Utilizing python to analyze

Importing libraries and data

Sentiment data

Collect data to dataframe

Treemap

Conclusion

You may also enjoy

Algorithmic Trading

DCF Valuation & Analysis (TNK)

U.S. Recession in 2024?

DCF Valuation (AAPL)