How to use Benzinga’s Conference Call Transcript WebSocket for Real-Time Sentiment Prediction

Learn how to use Benzinga's Conference Call Transcript WebSocket to extract real-time earnings call data and perform sentiment analysis for actionable market insights using Python

In today’s fast-paced markets, staying updated with the latest information is crucial for traders and investors. Benzinga’s conference call transcript WebSocket API provides access to real-time earnings call data, allowing users to monitor a company’s statements as they happen. This API streams transcript chunks, summaries, and other details such as the company’s ticker symbol, making it an essential tool for analysts and investors looking to gain immediate insights into a company’s strategic direction and performance.

In this guide, we’ll walk through the following steps:

  • Setting up the WebSocket connection
  • Extracting and organizing data using Python
  • Performing sentiment analysis on the extracted transcript data

Understanding Benzinga’s WebSocket for Conference Call Transcripts

The WebSocket for conference call transcripts is a real-time data stream that provides JSON-formatted responses with each incoming message. The WebSocket URL is as follows:

wss://api.benzinga.com/api/v1/earnings-call-transcripts/stream?token=<YOUR_API_KEY>

Each message contains a subset of the transcript in real-time, along with key metadata. Here’s a breakdown of the main fields you’ll receive in each message:

  • call_id: A unique identifier for the conference call.
  • transcript_chunk: A segment of the conference call transcript.
  • security: An object containing company-related information, such as:
    • ticker: The company’s stock ticker symbol.
    • exchange: The stock exchange where the company is listed.
    • company_name: The full name of the company.
    • cik: Central Index Key, a unique identifier for companies.
  • time: The time index (in seconds) representing when the transcript chunk was spoken.
  • type: Specifies the type of data, typically “transcript_chunk”.

Each message represents a small part of the transcript, allowing users to continuously receive and analyze information as it becomes available.

Setting Up the WebSocket and Extracting Data Using Python

To work with the WebSocket, we’ll use Python’s websocket-client library to establish a connection and handle incoming messages. Each message will be parsed to extract the relevant information.

Install Required Libraries

First, install the necessary libraries:

pip install websocket-client
pip install pandas

Establishing the WebSocket Connection

The following Python code sets up the WebSocket connection and listens for incoming messages from Benzinga’s conference call transcript WebSocket. Each message received from the WebSocket is parsed, and relevant fields are extracted and stored in a list. To control the duration of the WebSocket connection, we use a timer to automatically close the connection after 10 minutes, ensuring that the script runs for a limited period. This is particularly useful for collecting a quick sample of data without maintaining a continuous connection.

import websocket
import json
import pandas as pd
from threading import Timer

url = 'wss://api.benzinga.com/api/v1/earnings-call-transcripts/stream?token=<YOUR API KEY>'

data = []

# WebSocket event handlers
def on_message(ws, message):
    parsed_data = json.loads(message)
    content = parsed_data['data']['content']
    transcript_chunk = content['transcript_chunk']
    security = content['security']
    
    # Extract relevant fields and append to data list
    data.append({
        'call_id': content['call_id'],
        'transcript_chunk': transcript_chunk,
        'ticker': security['ticker'],
        'company_name': security['company_name'],
        'time': content['time']
    })
    
    print(f"Received transcript chunk: {transcript_chunk}")

def on_error(ws, error):
    print(f"Error: {error}")

def on_close(ws, close_status_code, close_msg):
    print("WebSocket connection closed")

def on_open(ws):
    print("WebSocket connection opened, receiving data...")

# Function to stop the WebSocket connection after 10 mins
def stop_websocket():
    print("Stopping WebSocket connection after 10 mins...")
    ws.close()

ws = websocket.WebSocketApp(url, on_message=on_message, on_error=on_error, on_close=on_close)
ws.on_open = on_open

websocket_thread = Timer(0, ws.run_forever)
websocket_thread.start()
stop_timer = Timer(600, stop_websocket)
stop_timer.start()

In this code, the WebSocket connection is established using Python’s websocket-client library. We define event handler functions (on_message, on_error, on_close, and on_open) to manage the behavior of the WebSocket. 

The on_message function is called every time a new message is received; it extracts key information (such as call_id, transcript_chunk, ticker, and time) and appends this data to a list called data. 

The on_open function confirms that the connection is active, while on_error and on_close manage any errors or the closure of the connection, respectively. 

To automatically stop the WebSocket after a fixed period, we use Python’s Timer to schedule the stop_websocket function to close the WebSocket after 10 minutes. 

This setup ensures that the WebSocket runs for a limited duration, collecting data for analysis without continuous connectivity.

Processing the Extracted Data

Once the data has been collected in the data list, we can easily convert it into a Pandas DataFrame for further analysis and manipulation. The DataFrame structure allows us to explore the data by accessing specific fields, such as the transcript_chunk or ticker, for deeper insights.

# Convert data to DataFrame
df = pd.DataFrame(data)
df.head()

Further Analysis: Performing Sentiment Analysis on Transcript Chunks

Sentiment analysis is a powerful tool to determine the emotional tone behind textual data. By analyzing the sentiment of each transcript chunk, we can gain insights into the company’s statements and overall market sentiment.

Setting Up Sentiment Analysis with VADER

We’ll use the VADER sentiment analysis tool from the nltk library, which is well-suited for financial text data. Install VADER with:

pip install nltk

Now, initialize the sentiment analyzer and apply it to each transcript_chunk to assess the tone of each statement in real time.

from nltk.sentiment.vader import SentimentIntensityAnalyzer
import nltk

nltk.download('vader_lexicon')

sid = SentimentIntensityAnalyzer()

df['sentiment'] = df['transcript_chunk'].apply(lambda x: sid.polarity_scores(x)['compound'])

df[['transcript_chunk', 'sentiment']].head()

Analyzing Sentiment by Ticker

Since the data contains transcript chunks from multiple companies, we can calculate the average sentiment for each ticker to get an overall impression of the tone associated with each company. This allows us to identify which companies have predominantly positive or negative sentiments based on the conference call discussions.

# Group by ticker and calculate the average sentiment score
average_sentiment = df.groupby('ticker')['sentiment'].mean().reset_index()
average_sentiment = average_sentiment.sort_values(by='sentiment', ascending=False)

# Display the average sentiment scores by ticker
print("Average Sentiment by Ticker:")
print(average_sentiment)

The average_sentiment DataFrame provides a sorted list of tickers with their average sentiment scores, helping us to quickly identify companies with generally positive or negative sentiment in their conference call transcript chunks. This is the output of the code:

Identifying the Most Positive and Negative Statements

To gain more specific insights, we can identify the individual transcript chunks with the highest and lowest sentiment scores. These excerpts often represent key moments in the conference calls where particularly positive or negative information was shared.

# Find the top 5 most positive statements
most_positive = df.nlargest(5, 'sentiment')[['ticker', 'company_name', 'transcript_chunk', 'sentiment']]
print("Most Positive Statements:")
display(most_positive)

# Find the top 5 most negative statements
most_negative = df.nsmallest(5, 'sentiment')[['ticker', 'company_name', 'transcript_chunk', 'sentiment']]
print("Most Negative Statements:")
display(most_negative)

This code extracts the top 5 positive and negative statements, allowing us to explore which specific remarks might have had the greatest impact on sentiment. These high- and low-scoring chunks can reveal moments of particular optimism or concern. This is the output of the code:

Conclusion

In this guide, we demonstrated how to connect to Benzinga’s WebSocket for conference call transcripts, extract real-time data, and perform sentiment analysis. By leveraging these tools, investors and analysts can gain valuable insights into company statements as they are made, identify shifts in sentiment, and make data-driven decisions.

This approach can be further expanded to include custom sentiment models, keyword analysis, and comparisons across multiple earnings calls. With Benzinga’s WebSocket, users have a powerful resource for understanding a company executive’s sentiment in real time.

OTHER ARTICLES

See what's happening at Benzinga

As new accounts stabilize, the intermediate trader will come into the limelight.  Here are Benzinga’s 2022 Data Market Predictions: 1.) Advanced Analytics will take Priority

Read More

As we close out Q1 of the new year, our attention is caught by the fact that the industry has exploded with a record number

Read More

In a fast paced world, empower users to make quick and subconscious decisions with a distinctive and easily recognizable company logo. Whether you use an

Read More