Creating a Stock Prediction App in Python

Ajk
6 min readApr 18, 2024

--

Developed by Ayush Kulkarni

This project is a culmination of both Flask, a Python Framework, along with Matplotlib, Sklearn, and Yahoo Finance. In this project I decided to utilize a Linear Regression Model from sklearn in order to predict future trends. I specifically chose the Linear Regression Model so that I could efficiently predict the future stock prices of given stocks. Combining this with a dataset taken from Yahoo Finance, I was able to create a better dataset and model. Yahoo Finance gave me an up-to-date dataset, which I was able to implement as the training set and test set data for my Linear Regression model. This dataset is compromised of features such as, “Close Price”, “Open Price”, “Volume”, “High”, and “Low”, however, I did need to add on another feature called “Date” so that I could keep track of the days. Here’s a look into a little bit of the raw data from one of the companies, Google.

Raw Data from Yahoo Finance for GOOG

And below is the code I used to produce this. I used the datetime module in Python in order to make the time period for the stock more flexible for users.

from datetime import date
import pandas as pd
import yfinance as yf

howmanyyears = int(input("How many years? > ")) # <-- Getting user input for years
today = date.today()
END_DATE = today.isoformat()
START_DATE = date(today.year - howmanyyears, today.month, today.day).isoformat()

whichstock = input("Which stock? > ") # <-- Getting user input for stock name
data = yf.download(whichstock, start=START_DATE, end=END_DATE)

data.reset_index(inplace=True)
data['Date'] = pd.to_datetime(data.Date) # <-- Inserting the 'Date' Feature

# Outputting the first 15 rows of data
print(data.head(15))
print(f"Data: {data.shape}")

Additionally, I added on two extra features: the Exponential Moving Average for 50 days and 200 days. Through doing this, I was able to predict whether the stock market was tending to be Bearish or Bullish during a certain period of time and show users more trends about their stock.

data['EMA-50'] = data['Close'].ewm(span=50, adjust=False).mean()
data['EMA-200'] = data['Close'].ewm(span=200, adjust=False).mean()

Before making the Regression Model, I wanted to first visualize some of this data. I plotted the High vs Low prices along with a graph of the daily closing price with the Exponential Moving Average for 50 and 200 days.

GOOG High vs Low plotted using matplotlib
GOOG Exponential Moving Average plotted using matplotlib

Here is the code used to generate this graph.

# High vs Low Graph
plt.figure(figsize=(8, 4))
plt.plot(data['Low'], label="Low", color="indianred")
plt.plot(data['High'], label="High", color="mediumseagreen")
plt.ylabel('Price (in USD)')
plt.xlabel("Time")
plt.title(f"High vs Low of {stock_name}")
plt.tight_layout()
plt.legend()

# Exponential Moving Average Graph
plt.figure(figsize=(8, 4))
plt.plot(data['EMA-50'], label="EMA for 50 days")
plt.plot(data['EMA-200'], label="EMA for 200 days")
plt.plot(data['Adj Close'], label="Close")
plt.title(f'Exponential Moving Average for {stock_name}')
plt.ylabel('Price (in USD)')
plt.xlabel("Time")
plt.legend()
plt.tight_layout()

After getting a feel of the dataset, it was time to dive into the Linear Regression Model. The end goal for this project was to predict the Closing Price for a given stock, so I made it the X component. That left me with all the other features as Y components.

x = data[['Open', 'High', 'Low', 'Volume', 'EMA-50', 'EMA-200']]
y = data['Close']

Next, Scikit-learn’s train_test_split function allowed me to separate the data into 80% for training and the rest 20% for testing.

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.2)

Now it was time to fit the Linear Regression Model and predict what the future prices would be.

lr_model = LinearRegression()
lr_model.fit(X_train, y_train)
pred = lr_model.predict(X_test)

Once this was done, I wanted to visualize how accurate my model was so I created a graph of the Model’s values vs the actual values.

Real vs Predicted Values for Linear Regression Model

Furthermore, I printed out the Real vs Predicted prices for the stock during random days. This allowed me to explicitly see how close or far apart these numbers were and make it more clear whether my model had worked as intended.

Actual Prices vs Predicted Prices
d=pd.DataFrame({'Actual_Price': y_test, 'Predicted_Price': pred})
print(d.head(10))
print(d.describe())

After finishing the Model all that was left was to try to predict the closing price based off of the different features. The one that stuck out to me the most was Volume vs Closing Price. I feel that this model worked really well and was able to mostly predict the values with little to no error.

Now time for some statistics. In order to fully see how well my model had performed I had to look at the r² score, the mean absolute error, and the mean squared score. Here’s a look at each of these values.

Linear Regression Model Results

My project didn’t stop here, however. Previously I had already learned Flask, one of Python’s frameworks, so I went on to implement this as an app using that. Additionally, I used Bootstrap for styling and got input from the user using POST Request to the backend. Here is a look at the website, its functionality, and the link if you want to check it out: https://a75ca6e3-93ff-47b5-bfa2-dccd5ca198a2-00-1fyfzxes7xh6e.spock.replit.dev/

Website Home Page
Website Stock Page

I won’t get into the details of the backend, but here is a quick glimpse of the code.

Backend of my Stock Trends App developed on replit.com

Looking back, there are a few changes I would make in my model. One of them being that I would change the Train-Test split from being 80–20 to being only from a certain time range in the past, therefore allowing my model to fully be on its own when predicting the future prices.

Additonally, using a Exponential Moving Average for 50 days and 200 days allowed me to visualize when there would be a Death Cross or Golden Cross. A Death Cross is when a long-term moving average crosses over a short-term moving average. The Golden Cross occurs when a short-term moving average crosses over a major long-term moving average. In yellow is an example of a Death Cross and in red is an example of a Golden Cross found in Google’s stock.

Throughout this entire project I was constantly learning new techniques of data analysis and Model creation. Some of these techniques being, refining datasets to get rid of outliers or null values, creating a Regression Model to predict certain things, and using matplotlib to visualize the graphs and information. Overall, I feel that I’m going to be taking a lot of new skills away from this amazing TRAIN AI course that will very much benefit me in the future.

--

--

Ajk
Ajk

Written by Ajk

JS FullStack Programmer

Responses (5)