Logistic Regression is very popular in Machine Learning, used to give predictions on something. (It is not the exact probabilities, but general values. )

Basically, L.R. has three main parts: Regression, Linear Regression and Logistic Function.

**Regression**

Regression is to estimate uncertain parameters in a certain function.

For example, , uncertain parameters are a and b. So if we have large set of (x, y), then we can give approximate estimation of a and b, maybe by drawing a line.

But in the real world, we do not have some “certain functions”. Like (Who knows how it came from…). So we can assume a structure of a function by doing analysis on a large dataset. Linear and non-linear regressions are depending on the functions.

**Linear Regression**

So we can easily solve the problem of , as described above.

If a functions has more than one variables:

We leave it to Linear Regression, use it to solve it. (Like a tool.)

But keep in mind that the formular need to be a right one.

**Logistic Funtion**

We need some values to be in the range of 0~1, we call it normalisation.

*Logistic Regression is a linear regression that normalized by a logistic function. *

To some extend, logistic function can help us removing noise.

**When will we use Logistic Regression?**

1) Predictions on probabilities, or classification

Not all ML methods can be used to do predictions on exact probs, like SVM. As for classification, a threshold might be needed.

2) Only for linear problem

When feature and target has a linear relation, Logistic Regression can be useful.

2) Dependence of features

We do not have the features to be all conditionally independent, they can be calculated independently. No posterior probs are needed. LR will not provide the combinations of the features.

**Logistic Regression Methods**

*Cost Function*: all the points (xi,yi), they have to be very close to the final line. So we have a cost function, and want to find out a min. We use *The Least Square Method*.

But it is to define the function, how we find out the values? We use Gradient Descent.

*Gradient Descent*

It is a greedy method. Here we give an easy version. For those who doesn’t want to study more detailed, this is enough…Main idea is that, first find a line randomly, then calculate the error, and change something to adjust. The error will become smaller after a round. Keep changing, until the error becomes very small. And we assign the current line to be the perfect one.

From:http://blog.csdn.net/memory513773348/article/details/16967211