Semi-definite programming generalizes linear programming. One standard definition of SDP is the following:

such that and .

Here are all matrices and the operation corresponds to entry-wise multiplication. An equivalent formulation of SDP is such that . So we can either view it as *linear *programming with matrices and a funny PSD constraint or as *quadratic *programming where the variables are vectors. Note that PSD matrices forms a convex cone in . If we simply treat it as a convex optimization problem, then everything scales as and is not efficient.

**MAX-CUT.** Given a graph with weight $w_{ij}$ on edge $e_{ij}$, the goal is to partition the vertices into two sets so that the sum of the crossing edges is maximized. This is a NP-hard problem and can be formulated as an integer program. Assign a binary variable to each vertex such that if belongs to the left partition and otherwise. Then MAX-CUT is equivalent to:

such that .

As a relaxation, let be a vector in , and solve the following SDP:

such that . The solutions are vectors on the unit sphere. We think of the vector as representing the node in some -dim feature space and as measuring the similarity between the two points. *Is there a way to interpret these features? *

To derive a hard partition, we randomly select a hyperplane and put all the vectors on the one side of the hyperplane into one partition. The neat thing is that the probability of being separated by the hyperplane is quite close to . So the randomized rounding is close to the optimal of the SDP, which is guaranteed to be better than the maximum MAX-CUT value. Truly an amazing algorithm.