Sample use In Octave:
X = [m rows of training data, where each row is a vector of n input values.]
y = [is a vector of m correct values between 0 and num_labels]
num_labels = the number of possible different lables to find for the data
lambda = 0.1;
m = size(X, 1);
n = size(X, 2);
class_thetas = zeros(num_labels, n + 1);
X = [ones(m, 1) X]; %add a column of ones.
options = optimset('MaxIter', 50);
guess = zeros(n+1, 1);
for k = 1:num_labels;
[theta] = fmincg (@(t)(Cost(t, X, (y==k), lambda)), guess, options);
class_thetas(k,:)=theta';
end
At this point, we have a set of thetas to classify each label. To use those, given XX; a new matrix of test data, with rows of n input values:
[confidence, label] = max(sigmoid( XX * class_thetas'), [], 2);
This example uses the standard Logistic Cost function, with Regularization.
function [J, S] = cost(theta, X, y) m = length(y); hyp = sigmoid(X*theta); %make a guess based on the sigmoid of our training data times our current paramaters. costs = -y' * log(hyp) - (1-y)' * log(1-hyp); %cost with sigmoid function J = sum(costs)/m + (lambda * sum(theta(2:end).^2) / (2*m)); %mean cost + regularization err = (hyp .- y); %actual error.
%Note this happens to be the derivative of our cost function. S = (X' * err)./m + (lambda .* [0;theta(2:end)] ./ m ); %slope of the error + regularization end