7 minute read

News Sentiment Analysis

In this article, I will conduct an in-depth analysis of an interactive stock sentiment treemap for the S&P 500.

The primary objective is to comprehend the prevailing public sentiment for each individual stock.

Let’s delve deeper into understanding what the collective opinion implies for each of these stocks.

You are able to see this treemap in this link

png

Upon analysis, it appears that the majority of stocks within the S&P 500 are associataed with positive news sentiments.

Utilizing python to analyze

Importing libraries and data

The methodology involved parsing stock data from Yahoo Finance, conducting data manipulation using the pandas library and visualization with matplotlib. The final stage incorporated the use of Natural Language Toolkit(NLTK) for sentiment analysis through machine learning techniques.

# libraries for webscraping, parsing and getting stock data
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
import yfinance as yf

# for plotting and data manipulation
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import plotly
import plotly.express as px

# NLTK VADER for sentiment analysis
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer

import requests as r
import io
data_link = 'https://www.dropbox.com/s/6kh2u0qotscww5n/component.csv?raw=1'

response = r.get(data_link)
content = io.StringIO(response.text)
data = pd.read_csv(content)

stocks = data['Symbol']
sector = data['Sector']

tickers = stocks
number_of_shares = tickers_dict.values()
import time
##### Scrape the Date, Time and News Headlines Data
finwiz_url = 'https://finviz.com/quote.ashx?t='
news_tables = {}

for ticker in tickers:
    # print(ticker)
    url = finwiz_url + ticker
    req = Request(url=url,headers={'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0'}) 
    response = urlopen(req)    
    # Read the contents of the file into 'html'
    html = BeautifulSoup(response)
    # Find 'news-table' in the Soup and load it into 'news_table'
    news_table = html.find(id='news-table')
    # Add the table to our dictionary
    news_tables[ticker] = news_table

    time.sleep(0.5)
    
news_table    
<table border="0" cellpadding="1" cellspacing="0" class="fullview-news-outer" id="news-table" width="100%">
<tr class="cursor-pointer" onclick="trackAndOpenNews(event, 'Motley Fool', 'https://finance.yahoo.com/m/d43d6e0d-bc30-3edf-a242-54ff19ab00cd/exxonmobil-is-spending-%244.9.html');">
<td align="right" width="130">
            Jul-14-23 07:15AM
        </td>
<td align="left">
<div class="news-link-container">
<div class="news-link-left">
<a class="tab-link-news" href="https://finance.yahoo.com/m/d43d6e0d-bc30-3edf-a242-54ff19ab00cd/exxonmobil-is-spending-%244.9.html" rel="nofollow" target="_blank">ExxonMobil Is Spending $4.9 Billion to Accelerate Its Lower-Carbon Energy Ambitions</a>
</div>
<div class="news-link-right">
<span>(Motley Fool)</span></div></div></td></tr>
<tr class="cursor-pointer" onclick="trackAndOpenNews(event, 'The Wall Street Journal', 'https://finance.yahoo.com/m/232999ff-2ca0-3cc5-b59d-fac8c75cc10b/exxon-buys-pipeline-operator%2C.html');">
<td align="right" width="130">
            Jul-13-23 09:18PM
        </td>
<td align="left">
<div class="news-link-container">
<div class="news-link-left">
<a class="tab-link-news" href="https://finance.yahoo.com/m/232999ff-2ca0-3cc5-b59d-fac8c75cc10b/exxon-buys-pipeline-operator%2C.html" rel="nofollow" target="_blank">Exxon Buys Pipeline Operator, Making Big Bet on Carbon</a>
</div>
<div class="news-link-right">
<span>(The Wall Street Journal)</span></div></div></td></tr>
<tr class="cursor-pointer" onclick="trackAndOpenNews(event, 'Investopedia', 'https://finance.yahoo.com/m/9865fa15-e7f0-3f3e-b4ee-55f258aa64c8/exxon-acquires-denbury-for.html');">
<td align="right" width="130">
            04:30PM
        </td> ...
##### Parse the Date, Time and News Headlines into a Python List
parsed_news = []
# Iterate through the news
for file_name, news_table in news_tables.items():
    # Iterate through all tr tags in 'news_table'
    for x in news_table.findAll('tr'):
        # Check if the 'a' tag exists within the 'tr' tag
        if x.a is not None:
            # Read the text from the 'a' tag
            text = x.a.get_text()
            date_scrape = x.td.text.split()
            # if the length of 'date_scrape' is 1, load 'time' as the only element
            if len(date_scrape) == 1:
                time = date_scrape[0]
                
            # else load 'date' as the 1st element and 'time' as the second    
            else:
                date = date_scrape[0]
                time = date_scrape[1]
            # Extract the ticker from the file name, get the string up to the 1st '_'  
            ticker = file_name.split('_')[0]
            
            # Append ticker, date, time and headline as a list to the 'parsed_news' list
            parsed_news.append([ticker, date, time, text])

        else:
            # Handle cases where the 'a' tag is not found as desired
            text = "Not found"  # Or any other value or action depending on your needs
       
parsed_news
[['AAPL',
  'Jul-14-23',
  '08:00AM',
  'Warren Buffetts Favorite Dividend Stocks  Should You Invest?'],
 ['AAPL',
  'Jul-14-23',
  '07:30AM',
  '2 No-Brainer Growth Stocks Up 46% and 63% to Buy Before the Next Bull Market'],
 ['AAPL',
  'Jul-14-23',
  '06:20AM',
  'Amazon has become one the most boring stories in tech: Morning Brief'],
 ['AAPL',
  'Jul-14-23',
  '06:05AM',
  '2 Under-the-Radar Gaming Stocks You Can Buy and Hold for the Next Decade'],
 ['AAPL',
  'Jul-14-23',
  '05:50AM',
  '3 Best Buffett Stocks to Buy for the Long Haul'],
 ['AAPL',
  'Jul-14-23',
  '05:35AM',
  'Got $3,000? 2 Tech Stocks to Buy and Hold for the Long Term'],
 ['AAPL',
  'Jul-14-23',
  '05:26AM',
  'Is This Top Tech Stock About to Become the Next $3 Trillion Company?'],
 ['AAPL', 'Jul-13-23', '07:14PM', 'The 2023 Unhedged stock draft'],
 ['AAPL',
  'Jul-13-23',
  '05:55PM',
  '3 Stocks to Buy for Artificial Intelligence (AI) Exposure'],
 ['AAPL',
  'Jul-13-23',
  '04:29PM',
  "Which Companies Could Buy a Stake in Disney's ESPN?"],
 ['AAPL',
  'Jul-13-23',
  '04:15PM',
  "Hollywood actors strike could be a 'near-term benefit' for streamers: Analyst"],
 ['AAPL',
  'Jul-13-23',
  '01:34PM',
  'Apple Defies Decline As iPhone Market Share Surges Amid Smartphone Slump'],
 ['AAPL',
  'Jul-13-23',
  '11:08AM',
  "Apple's (AAPL) Streaming Service Receives 54 Emmy Nominations"],
 ['AAPL',
  'Jul-13-23',
  '10:31AM',
  'Apples British sales bounce back to record £1.5bn after Covid slump'],
 ['AAPL', 'Jul-13-23', '09:45AM', '2 Stocks to Invest in Virtual Reality'],   ...
##### Perform Sentiment Analysis with Vader
# Instantiate the sentiment intensity analyzer
vader = SentimentIntensityAnalyzer()
# Set column names
columns = ['ticker', 'date', 'time', 'headline']
# Convert the parsed_news list into a DataFrame called 'parsed_and_scored_news'
parsed_and_scored_news = pd.DataFrame(parsed_news, columns=columns)

# Iterate through the headlines and get the polarity scores using vader
scores = parsed_and_scored_news['headline'].apply(vader.polarity_scores).tolist()
# Convert the 'scores' list of dicts into a DataFrame
scores_df = pd.DataFrame(scores)

# Join the DataFrames of the news and the list of dicts
parsed_and_scored_news = parsed_and_scored_news.join(scores_df, rsuffix='_right')
# Convert the date column from string to datetime
parsed_and_scored_news['date'] = pd.to_datetime(parsed_and_scored_news.date).dt.date
parsed_and_scored_news.tail()

Sentiment data

The process involves populating a DataFrame with Parsed news data. Each piece of news is subsequently analyzed to determine whether the sentiment is positive, negative or neutral. A compound score is then computed, which provides an aggregated measure of the sentiment trend.

ticker date time headline neg neu pos compound
9995 XOM 2023-06-16 07:40AM Oil faces a 'serious problem' by 2024 as produ... 0.197 0.803 0.00 -0.5267
9996 XOM 2023-06-16 07:08AM Inside the race to remake lithium extraction f... 0.000 1.000 0.00 0.0000
9997 XOM 2023-06-16 07:00AM Investors in Exxon Mobil (NYSE:XOM) have seen ... 0.000 0.820 0.18 0.5106
9998 XOM 2023-06-16 07:00AM INSIGHT-Inside the race to remake lithium extr... 0.000 1.000 0.00 0.0000
9999 XOM 2023-06-15 07:09PM Tales of Stock Compounding: Does It Always Work? 0.000 1.000 0.00 0.0000
# Group by each ticker and get the mean of all sentiment scores
mean_scores = parsed_and_scored_news.groupby(['ticker']).mean()
mean_scores
C:\Users\Admin\AppData\Local\Temp\ipykernel_16264\2723772882.py:2: FutureWarning:

The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
neg neu pos compound
ticker
AAPL 0.03496 0.84100 0.12402 0.134145
ABBV 0.03245 0.84974 0.11781 0.135030
ABT 0.04561 0.86803 0.08636 0.061686
ACN 0.02564 0.86268 0.11168 0.143406
ADBE 0.04316 0.87446 0.08239 0.052778
... ... ... ... ...
VZ 0.02482 0.86735 0.10782 0.141956
WBA 0.09430 0.81435 0.09136 0.019264
WFC 0.06189 0.75951 0.17859 0.161680
WMT 0.03938 0.80769 0.15293 0.152200
XOM 0.04880 0.82672 0.12449 0.118250

100 rows × 4 columns

import numpy as np

# Parsing each stocks' price.
industries = []
sectors = []
prices = []
for ticker in tickers:
    tickerdata = yf.Ticker(ticker)
    try:
        prices.append(tickerdata.info['currentPrice'])
    except KeyError:
        prices.append(np.nan)

    try:
        sectors.append(tickerdata.info['sector'])
    except KeyError:
        sectors.append(np.nan)

    try:
        industries.append(tickerdata.info['industry'])
    except KeyError:
        industries.append(np.nan)

              
# dictionary {'column name': list of values for column} to be converted to dataframe
d = {'Sector': sectors, 'Industry': industries, 'Price': prices}
# create dataframe
df_info = pd.DataFrame(data=d, index=tickers)
df_info
Sector Industry Price
Symbol
AAPL Technology Consumer Electronics 190.320
ABBV Healthcare Drug Manufacturers—General 133.835
ABT Healthcare Medical Devices 106.810
ACN Technology Information Technology Services 315.150
ADBE Technology Software—Infrastructure 512.940
... ... ... ...
VZ Communication Services Telecom Services 34.565
WBA Healthcare Pharmaceutical Retailers 29.890
WFC Financial Services Banks—Diversified 43.455
WMT Consumer Defensive Discount Stores 154.430
XOM Energy Oil & Gas Integrated 104.000

100 rows × 3 columns

df = mean_scores.join(df_info)
df = df.rename(columns={"compound": "Sentiment Score", "neg": "Negative", "neu": "Neutral", "pos": "Positive"})
df = df.reset_index()
df

Collect data to dataframe

ticker Negative Neutral Positive Sentiment Score Sector Industry Price
0 AAPL 0.04388 0.84774 0.10836 0.086201 Technology Consumer Electronics 190.320
1 ABBV 0.03000 0.84824 0.12176 0.146174 Healthcare Drug Manufacturers—General 133.835
2 ABT 0.04847 0.86294 0.08859 0.061468 Healthcare Medical Devices 106.810
3 ACN 0.02564 0.86268 0.11168 0.143406 Technology Information Technology Services 315.150
4 ADBE 0.04145 0.87230 0.08625 0.064446 Technology Software—Infrastructure 512.940
... ... ... ... ... ... ... ... ...
95 VZ 0.02637 0.87088 0.10275 0.125678 Communication Services Telecom Services 34.565
96 WBA 0.09430 0.81435 0.09136 0.019264 Healthcare Pharmaceutical Retailers 29.890
97 WFC 0.07355 0.77450 0.15193 0.104411 Financial Services Banks—Diversified 43.455
98 WMT 0.04117 0.80796 0.15086 0.147280 Consumer Defensive Discount Stores 154.430
99 XOM 0.05038 0.80983 0.13980 0.137952 Energy Oil & Gas Integrated 104.000

100 rows × 8 columns

Treemap

fig = px.treemap(df, path=[px.Constant("Sectors"), 'Sector', 'Industry', 'ticker'],
                  color='Sentiment Score', hover_data=['Price', 'Negative', 'Neutral', 'Positive', 'Sentiment Score'],
                  color_continuous_scale=['#FF0000', "#000000", '#00FF00'],
                  color_continuous_midpoint=0)
fig.data[0].customdata = df[['Price', 'Negative', 'Neutral', 'Positive', 'Sentiment Score']].round(3) # round to 3 decimal places
fig.data[0].texttemplate = "%{label}<br>%{customdata[4]}"
fig.update_traces(textposition="middle center")
fig.update_layout(margin = dict(t=30, l=10, r=10, b=10), font_size=20)

plotly.offline.plot(fig, filename='stock_sentiment.html') # this writes the plot into a html file and opens it

Conclusion

As previously mentioned, you can refer to the hyperlink above to review the sentiment analysis conducted for each stock within the S&P 500. The majority of the anaylzed news sentiments lean towards positivity. This can aid in understanding market trends. However, it is important to note that that this could also lead investors to make misguided decisions.

Categories:

Updated: