Sample use In Octave:
X = [m rows of training data, where each row is a vector of n input values.] y = [is a vector of m correct values between 0 and num_labels] num_labels = the number of possible different lables to find for the data lambda = 0.1; m = size(X, 1); n = size(X, 2); class_thetas = zeros(num_labels, n + 1); X = [ones(m, 1) X]; %add a column of ones. options = optimset('MaxIter', 50); guess = zeros(n+1, 1); for k = 1:num_labels; [theta] = fmincg (@(t)(Cost(t, X, (y==k), lambda)), guess, options); class_thetas(k,:)=theta'; end
At this point, we have a set of thetas to classify each label. To use those, given XX; a new matrix of test data, with rows of n input values:
[confidence, label] = max(sigmoid( XX * class_thetas'), [], 2);
This example uses the standard Logistic Cost function, with Regularization.
function [J, S] = cost(theta, X, y) m = length(y); hyp = sigmoid(X*theta); %make a guess based on the sigmoid of our training data times our current paramaters. costs = -y' * log(hyp) - (1-y)' * log(1-hyp); %cost with sigmoid function J = sum(costs)/m + (lambda * sum(theta(2:end).^2) / (2*m)); %mean cost + regularization err = (hyp .- y); %actual error.
%Note this happens to be the derivative of our cost function. S = (X' * err)./m + (lambda .* [0;theta(2:end)] ./ m ); %slope of the error + regularization end