Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Perceptron fit #156

Closed
FrankC01 opened this issue Oct 16, 2021 · 7 comments
Closed

Error in Perceptron fit #156

FrankC01 opened this issue Oct 16, 2021 · 7 comments

Comments

@FrankC01
Copy link

OS: Big Sur
Python (venv): 3.9.7
numpy: 1.21.2
pandas: 1.3.3

When executing fit in Chapter 2:

self.w_[0] += update
ValueError: setting an array element with a sequence
@rasbt
Copy link
Owner

rasbt commented Oct 16, 2021

Hi Frank,
could you share the whole code you are using for chapter 2? Maybe there is a typo somehwere regarding calculating "update". For reference the relevant line is this: https://github.com/rasbt/python-machine-learning-book-3rd-edition/blob/master/ch02/ch02.py#L137

@FrankC01
Copy link
Author

Hand typed from the book:

Perceptron

import numpy as np


class Perceptron(object):
    def __init__(self, eta=0.01, n_iter=50, random_state=1) -> None:
        self.eta = eta
        self.n_iter = n_iter
        self.random_state = random_state

    def fit(self, X, y):
        rgen = np.random.RandomState(self.random_state)
        self.w_ = rgen.normal(loc=0.0, scale=0.01, size=1 + X.shape[1])
        self.errors_ = []

        for _ in range(self.n_iter):
            errors = 0
            for xi, target in zip(X, y):
                update = self.eta * (target - self.predict(xi))
                self.w_[1:] += update * xi
                self.w_[0] += update
                errors += int(update != 0.0)
            self.errors_.append(errors)
        return self

    def net_input(self, X):
        return np.dot(X, self.w_[1:]) + self.w_[0]

    def predict(self, X):
        return np.where(self.net_input(X) >= 0.0, 1, -1)

Driver

#!/usr/bin/env python3
# -*- coding: utf-8; py-indent-offset:4 -*-

import os
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from perceptron import Perceptron


def perceptron():
    s = os.path.join('https://archive.ics.uci.edu', 'ml',
                     'machine-learning-databases', 'iris', 'iris.data')
    print(f"Url : {s}")
    df = pd.read_csv(s, header=None, encoding='utf-8')
    # print(f"{df.tail()}")
    # y df is the first 100 using 4 columns
    y = df.iloc[0:100, 4].values
    # print(f"{y}")
    y = np.where(y == 'Iris-setosa', -1, 1)
    # print(f"{y}")
    # X df is the lengths of sepal and petal (Cols 0 and 2)
    X = df.iloc[0:100, [0, 2]]
    #
    plt.scatter(X.iloc[:50, 0], X.iloc[:50, 1],
                color='red', marker='o', label='setosa')
    plt.scatter(X.iloc[50:100, 0], X.iloc[50:100, 1],
                color='blue', marker='x', label='versicolor')
    plt.xlabel('sepal length [cm]')
    plt.ylabel('petal length [cm]')
    plt.legend(loc='upper left')
    # plt.show()

    # Do some ML!!!
    ppn = Perceptron(eta=0.1, n_iter=10)
    ppn.fit(X, y)
    plt.plot(range(1, len(ppn.errors_) + 1), ppn.errors_, marker='o')
    plt.xlabel('Epochs')
    plt.ylabel('Number of updates')
    plt.show()


if __name__ == '__main__':
    perceptron()

@rasbt
Copy link
Owner

rasbt commented Oct 16, 2021

That's because the X array is a pandas DataFrame that creates the mismatch. If you change

ppn.fit(X, y)

to

ppn.fit(X.values, y)

it should work.

@rasbt rasbt closed this as completed Oct 16, 2021
@FrankC01
Copy link
Author

That did it but the book and the example do not show that.

@rasbt
Copy link
Owner

rasbt commented Oct 16, 2021

In the book, it was done earlier when preparing the X numpy array:

Screen Shot 2021-10-16 at 9 13 35 AM

You could also do that in your code above, but then you have to adjust the scatter function in your code, which uses .iloc, so I suggested in your code just to modify the part

ppn.fit(X.values, y)

@FrankC01
Copy link
Author

FrankC01 commented Oct 16, 2021

Ahh, correct... I did not have X = df.ilog[0:100, [0,2]].values but X = df.ilog[0:100, [0,2]] and because of that I was getting errors on the scatter which fixed with iloc.

Clearly I'm n00b for pandas, numpy and NL :)... what better way to spend on a rainy day!

Thanks!

@rasbt
Copy link
Owner

rasbt commented Oct 16, 2021

No worries, these things are really easy to overlook :). Glad it was an easy fix though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants