Wednesday, July 29, 2015

I will give up this blogger...and move to CSDN blogger for sake of their code snippets functionality. This is my new blogger

Thursday, June 18, 2015

Seriously, what is a SDE ?

The author of  Homebrew failed Google's interview, because he even can't inverse a binary tree on white board. I tried the question yesterday morning, straight forward recursion with a little bit trick in the order of processing. But I am far from a SDE, I didn't even write a software before, let alone something like Homebrew.

This is ironic.

I am taking summer intern in a young tech company called ShutterStock, this is a very dynamic company, the team I am working with all use window .NET framework and C#. Those people are very good at framework thing and very fluently use very good third party tools. All in all, they are productive and skillful. However, I remembered one evening I was doing my Machine learning assignment at office, and my colleagues saw it and asked what is it. I said I am coding a SVM(supported vector machine). None of them even heard it. They claimed that it is a academic thing.

I am not good at algorithm questions for now, but I do admire the ability of analyzing and solving them. On one hand, it could only base on how much practice and how many times you have done it, on other hand, at the same time you are forming your style of thinking and analyzing. I believe no one is born to be a master in algorithm problem solving, it takes effort, some people learn fast, and some go furthest. But what finally you gonna get out of it? An offer? A hobby? ...

But the truth is that, every CS people need to go through this, and master it to some extent.
I am taking a summer night course and full time intern, wouldn't have any time sitting down and solving leetcode problems till July. However, I do promise a series of posting on classic methods in August if not earlier.

Saturday, April 25, 2015

[LeetCode] Decode Ways

A message containing letters from A-Z is being encoded to numbers using the following mapping:
'A' -> 1
'B' -> 2
...
'Z' -> 26
Given an encoded message containing digits, determine the total number of ways to decode it.
For example,
Given encoded message "12", it could be decoded as "AB" (1 2) or "L" (12).
The number of ways decoding "12" is 2.
Hide Tags
 Dynamic Programming String









My friend was asked this problem in his last round interview with Facebook, he got the offer finally. I didn't do this problem before I heard from him, I was wondering if I could do this onsite without knowing it beforehand.

Till now, after one month's exercise in algorithm problems, I have got some basic perception about some classic methods. For this problem, after I reading it, things came up in my mind is DFS + BackTracking. This is a right point to start this question, then we can directly move to DP without implementing a recursive solution. 

I spent quite a while to figure out the relation between subproblems and problems, it turned out that the reason it was so hard is that I didn't really understand what this problem is asking.  I want to first see a very similar problem, and then say how this problem is derived from it.

You are climbing a stair case. It takes n steps to reach to the top.
Each time you can either climb 1 or 2 steps. In how many distinct ways can you climb to the top?
Hide Tags
 Dynamic Programming





At each step you can climb one or two stairs, how many distinct ways to climb to the top?
Let's see the relation between subproblem and problem
DP[i]  = DP[i-1] + DP[i-2]
              |                    |____ (This part only exists if DP[i-2] exist)
              |_______________(This part only exists if DP[i-1] exist)
The relation is quite easy to understand. You can get to ith stair from (i-1) th stair or from (i-2)th stair if i >2. So the number of ways you reach ith stair will be sum of number of ways to reach (i-1)th and number of ways to reach (i-2)th stair 
But when you do this the first time, this relation may not be so easy to think of. Why? If a method is quite simple, but hard to think of, very likely because it relies very much on the 'direction' of your thoughts. And you focus too much on the detail of the problem but never had a chance to take a high overview to get a direction to move forward. 
I think this is the philosophy behind Dynamic Programming, you need to have a general sense of how things/problems are 'growing'/building up, before you move to details/conditions on it.
For this problem, the relation is simple, and conditions are few, without analyzing it in right order, we can still get the solution. But when problems become complicate as 'Decide Ways',
if you jump into details without figuring out a overall direction, you gonna get lost the complex details forever. 





Let's move back to 'Decode ways'
The relation is :
DP[i] = DP[i-1]   +   DP[i-2]
               \                  \_____________(if   str[i-2] exists and 10<= int(str[i-1] + str[i]))<=26  )
                  \__________________(If str[i-1] exists and str[i] != '0' )
It is exactly same relationship, but with more conditions. 
I previous analysis, I mentioned many times that first find the 'direction' before analyzing the detail, the direction is the relation, you can see, though details become more complicate but direction is as simple as before. And pause for a second, think about how many possible 'directions' for 1 dimension DP and for 2 dimension DP could there be ?

Though DP is the best way to solve such problems, recursion always is the most thrilling way.
Let's first try to do it in recursion.



class Solution:
    # @param {string} s
    # @return {integer}
    num = 0
    def numDecodings(self, s):
        self.helper(s)
        return Solution.num
        
    def helper(self,s):
        if len(s) == 0:
            Solution.num+=1
            return
        for i in xrange(2):
            if i+1 <= len(s) and self.isValid(s[:i+1]):
                self.helper(s[i+1:])
        
    def isValid(self,s):
        if int(s) == 0:
            return False
        if 0 < int(s) <= 26:
            return True
        

I should really pause and think about this recursion.  This is ...impressive. These several lines give you so clear the idea of DFS and backtracking.
In recursion, how to return and what to return is always the most interesting part.  In this summer I will write a  subject on recursion, this one must be one of many I want mention.
Then Let's look at DP, which is more efficient and more understandable. 
class Solution:
    # @param {string} s
    # @return {integer}
    def numDecodings(self, s):
        if not s:
            return 0

        dp = [0]*(len(s)+1)
        dp[0] = 1
        for i in xrange(len(s)):
            if self.isValid(s[i]):
                dp[i+1] = dp[i-1+1]
            if i>0 and self.isValid(s[i-1:i+1]):
                dp[i+1] += dp[i-2+1]
        return dp[-1]

    def isValid(self,s):
        if int(s) == 0:
            return False
        if len(s) == 1 and 0<int(s) <10:
            return True
        if len(s) == 2 and 10 <= int(s) <= 26:
            return True


There is really not much to say about DP solution, it is worthy noticing that one entry is prefixed to DP array. This kind of manipulation is needed to prevent indexing out of range while keep thing consistent. Because of this, DP[i] actually corresponds to S[i-1]




Saturday, April 18, 2015

[LeetCode] Generate Parentheses

Given n pairs of parentheses, write a function to generate all combinations of well-formed parentheses.
For example, given n = 3, a solution set is:
"((()))", "(()())", "(())()", "()(())", "()()()"
Hide Tags
 Backtracking String







When I read online solutions about this problem, people like to say this is a typical recursion problem.
Err..I agree, but only knowing this is a recursion problem, is not enough to solve it.
 Interesting, the tag given by leetcode is Backtracking. I first learned about the term "backtracking" like 3-4 months ago, I searched it online, didn't really get it, but kind of having the perception that this is a way to avoid 'hopeless' directions when recursively solve subproblems.
Let's take a while to look at the word "backtracking", actually as long as it is tracking, you go back... there is no such thing as forward tracking... so "backtracking" is the same as saying "tracking". What do you need when you track something? Yes, you need history records. Something records what have been done from the past till now.  Why this would help?  It saves you from digging deeper into wrong direction as long as you can tell it already fails whatever the requirements are. I think the previous sentence already indicates something interesting, yes, by saying digging deeper, we are referring to DFS style. So backtracking help you facilitate your DFS process to be more efficient by skipping over wrong branches.
I think it would be worthy to pick up some key words out of the above graph 
requirements, skip 

Let's first see a primitive DFS code for a more primitive version of this problem,
give n =3, return all possible combinations of parentheses.(no need to be right-formatted) 

    def generateParenthesis(n):
        res = []
        self.helper(n,0,0,res,[])
        return res
        
    def helper(self,n,num_left,num_right,res,s):
        if num_left+num_right == 2*n :
            s = "".join(s)
            res.append(s)
            return
        
        if num_left < n:
            s.append('(')
            self.helper(n,num_left+1,num_right,res,copy.copy(s))
            s.pop()
        if num_right < n:
            s.append(')')
            self.helper(n,num_left,num_right+1,res,copy.copy(s))
        return


 The above code should be clear.
 Instead of first talking about what would be new for returning right-formatted combinations, let's directly jump to the code 






    def generateParenthesis(n):
        res = []
        self.helper(n,0,0,res,[])
        return res
        
    def helper(self,n,num_left,num_right,res,s):
        if num_left+num_right == 2*n :
            s = "".join(s)
            res.append(s)
            return
        
        if num_left < n:
            s.append('(')
            self.helper(n,num_left+1,num_right,res,copy.copy(s))
            s.pop()
        if num_left > num_right:
            s.append(')')
            self.helper(n,num_left,num_right+1,res,copy.copy(s))
        return


Since we are required to return right-formatted combinations, we have requirement to fulfill now. If we go blindly to have all possible combinations(notice that to be possible, there usually are also some restrictions, here is that we only have as many as n '(' and ')' to assign), most of them would not be valid. This indicates, by only trying to get valid combination, we could save a lot of computation.(Notice that we are not going to filter valid ones out of all possible ones, we are only to get those valid ones, there would be nothing at last as the all possible ones) 

Here comes the idea of backtracking, based on requirements, we skip not-valid ones as soon as they turn out to be invalid.

I will stop here, I think the idea of backtracking should reveal itself by comparing above two versions of code. One practical suggestion would be, before backtracking, first write the primitive recursion version, which will help you to think how to track/ what to track.






Friday, April 17, 2015

[LeetCode]Word Break...Wait, it's more than Word Break...

Given a string s and a dictionary of words dict, determine if s can be segmented into a space-separated sequence of one or more dictionary words.
For example, given
s = "leetcode",
dict = ["leet", "code"].
Return true because "leetcode" can be segmented as "leet code".

I have a friend who interviewed with FaceBook today, and was asked for this question.  
For me , till now I am still a 'young' coder with only 8 months of computer science experience and only a little time spent on LeetCode or CodeChallenge .  I am still kind of forming my perception of what does it mean by solving A problem through coding.
Even without the tag, I should feel this problem is a typical DP problem.  I will talk about this later.
For me, at this stage, I still feel recursion way of coding is more attractive and elegant. 
When first saw this problem, thing came out in my mind is DFS.  What is DFS? Maybe rigorously I should not mess the conception of DFS with recursion, but DFS always indicates recursion. Recursion's style is going deeper, deeper, till deepest and then turn back. 
Recursion's logic is you are repeating some process on subproblems.  
To understand which way is vertical, it is helpful to compare with which way is horizontal 
Firstly, when I try to summarize this question, I came up something like:
Depth First :
  l     eetcode  
   I     e     etcode      
    I      e    e     tcode    
     ...
    Breadth First
     l     eetcode     le etcode    lee  tcode   leet code ...

     For deep first search, I want cut into as many pieces as possible, till all pieces consist of one letter.
     For breadth first, I want first examine all one cut cases, then all two cuts cases, so on so forth. 

Till now, I am so lost. 
After thinking for quite a while, the above illustration is right.
If I reorganize it ,
    Breadth First(lvl 1)
  l     eetcode  ->    le etcode ->   lee  tcode ->  leet code ...
    Depth First
  l     eetcode 
     Breadth First (lvl 2)
  I     e     etcode      -> l    ee    tcode -> l   eet    code ....
    Depth First
   l    e      etcode
     Breadth First (lvl 3)
   I     e     e    tcode   -> l     e    et    code -> l     e    etc    ode ...       
     ...

     Now it is clear, if we want to recursively solve this problem, this is a typical DFS. I have done some recursive coding on topics of Binary Tree and Binary Search Tree. In the recursion function, I will have a part like first do something on left node, then on right node. Think about this process, once we call the function itself on left node, we are now in a DFS style.  In each function, your Breadth is 2, because you first iterate on left node, then right node. But you never move to the right node, until the function on left node returns. When will it return? When finally the last left node meets the return condition specifying by the recursion function. 
       However, in this breakword problem, instead of having breadth equal 2, we have  len(s) as our breadth. The thing we repeating doing recursively is that we check and cut the input string, and all those possible cuts are indicated by the Breadth. We will call recursive function on our every cut first before moving to next possible cut, so this is DFS.
         See the primitive DFS code:

def wordBreak(s,wordDict):
        if s in wordDict :
            return True
        elif s not in wordDict and len(s) == 1:
            return False
        else:
            for bp in xrange(1,len(s)-1): ##Breadth
                if (wordBreak(copy.copy(s[:bp]),wordDict) and wordBreak(copy.copy(s[bp:]),wordDict)):##Depth
                    return True
            return False

I don't want to discuss how to decide the return condition here, only focus on what exactly is a DFS, for now I should know for sure, there must be a return condition.
This function must be very time costing. If it is a long sting, for every possible substring, and for the substring of substring... we iterated over their length. Imagine how many loops there could be! It will grow exponentially  ! The number of loops are n^n !   How do we get the power n here? For one loop, it costs O(n), and each level has a loop, and the depth is n, so there are n level which means n loops! Think about why the depth is n.

The code is elegant in appearance, but its performance is ... ugly. Think about it, do we really need to go through 'each breadth' so completely? Is there any condition can help us skip some 'hopeless' 'breadth', so we can move quickly to next 'breadth'? 
        See the improved DFS code:


def wordBreak2(s,wordDict):

        if s in wordDict :
            return True
        elif s not in wordDict and len(s) == 1:
            return False
        else:
            for bp in xrange(1,len(s)):
                if s[:bp] not in wordDict:
                    continue##we skip this 'breadth' without wasting time digging deeper
                else:
                    if wordBreak2(s[bp:],wordDict): ##we digger deeper
                        return True ##once we get one instance, we can return, otherwise move further to next 'breadth'
            return False


Long for short, this improved version is much much faster on average,  but we could adversely design cases which make this version also end up with time complexity n^n. Think about it!

It seems I have strayed away too far... I was thinking about writing DP in this post.
....
In order to avoid worst n^n cases, we need to use DP. What is DP and why DP?
I have been starting to think about this concept recently thought I first leaned it like 4 months ago in my algorithm class. 
DP is dynamic programming...if a problem could be divided into subproblems, then this may indicate DP.
Why DP, DP is time safe. You will never get exponential time complexity which takes forever. 
Always, DP and recursion indicate each other. They all rely on same relation between subproblems and problems. 
So, what is the relation here taken use by the recursive function ? Say the key line in primitive version:

if (wordBreak(copy.copy(s[:bp]),wordDict) and wordBreak(copy.copy(s[bp:]),wordDict)):##Depth
                    return True
The relationship should be clear, cut the sting into two pieces, if each of them satisfies the requirement, the string satisfies. The subproblem is on the substring. We solve the subproblems to solve the problem. 
Let's now think about how to build up the solution from subproblems. Notice subproblem also has its subproblems, what would be the subbest-problem. ..
It turns out it relates to the return condition in the recursion function.
 

        if s in wordDict :
            return True
        elif s not in wordDict and len(s) == 1:
            return False

First look at elif part, we can realize a letter is the subbest-problem...
What is the if part? think about it and we will see how to use it in DP.
(It turns out that the if part is actually part of relationship we described before but at that time, omitted this possible relation. Because this is not a relation between subproblems and problems but instead a self-relation. The problem could possibly be solved without referring to any subproblem, or from another perspective, a string could be divided into a null sting and itself!)

From the subbest-problem, we will build up towards the problem. Say i and j are the index of the string. for i, j in [0,len(s)-1], we want to know if [i,j] satisfies the requirement. 

    def wordBreak( s, wordDict):
        l = len(s)
        dp = [[False]*l for i in xrange(l)]
        for n in xrange(0,l):
            for i in xrange(0,l):
                j = i+n
                if j > l-1:
                    break
                if s[i:j+1] in wordDict:
                    dp[i][j] = True
                else:
                    for k in xrange(i,j):
                        if dp[i][k]  and dp[k+1][j] :
                            dp[i][j] = True
                            break
        return dp[0][l-1]
        

    
I am tied ... but excited to observe so much correlation between recursion solution and DP solution. Let's summarize quickly and I am going to have a complete(hopefully) and in depth DP post later this summer.

Conceptually, let's think about the way recursion do things. Notice it call itself, and all local variables are on stack(I know this is not true for Java or python ) 
, ok, even though you have them on heap (say java, except buildin type, all on heap), when out of the scope, stack variable gonna vanish and those on heap will be garbage collected, actually no matter where the variable are located, outside the function inside which it is defined, you can't refer to it unless you return it.Whatever  the point is that you can't retrieve them any more.
 
But look at DP, it is more foresight. Since recursion computes the things again and again, why not record them and retrieve them whenever needed in the further.(I know it is not convincing to just say recursion computes same thing many times, think about those subproblems, they are overlap very much!)

The more interesting point I want to make is that DP is actually exactly a more smart and better-planning version of recursion. In Recursion, we have a return condition to stop it, in DP we have base problem to start it. In Recursion, we use the relation to divide problem into sub-problems, in DP we use the same relation to combine sub-problems  into problems. So the point should be clear, Recursion is TOP-DOWN, and DP is BOTTOM-UP. 

That's all for this question. Days ago I saw a very nice way of building up a 2-D dp matrix, plus the way which this problem use, I know three style of building the matrix up now. But no matter how fancy the matrix is built up, it is decided by the relation!  
 







Thursday, April 16, 2015

[LeetCode] Gas Station

There are N gas stations along a circular route, where the amount of gas at station i is gas[i].
You have a car with an unlimited gas tank and it costs cost[i] of gas to travel from station i to its next station (i+1). You begin the journey with an empty tank at one of the gas stations.
Return the starting gas station's index if you can travel around the circuit once, otherwise return -1.
Note:
The solution is guaranteed to be unique.
Tags: Greedy
Thoughts...

假设 有n个station,那么第一个坐标为 0(同样也为n,因为lis是首尾相接的环), 第n个坐标为 n-1.  
这类题目是纯思路类型的题目。
关于这个题解题思路网上已经有很多总结,核心就是一句话 : 如果从Station_i 出发后,最远只到达了Station_j , 那么我们可以确定,起点一定不在 [Station_1, Station_j ] 这个闭区间。 所以可以从Station_(j+1) 作为起点从新开始排查。

如果这道题不是唯一起点呢? 思路还应当是大同小异的。 以上法则依然试用,但是我们只能获得第一个符合要求的起点,对于剩下的合理的起点, 在第一个合理起点后的第一个点和终点前的一个点之间,重复搜索第一个起点的方法,直至排查至第Station_(n-1) 为止。算法复杂度worst case (全是合理起点)是O(n^2)。

这个题目更值得思考的地方是LeetCode上给的 tag -- Greedy
我个人也处在对Greedy的理解中,这类问题形势很多,如果简单的概括为 :Make locally optimal solution in hope of reaching global optimization。 恐怕指导意义不大。

试用Greedy的一个必要条件是 : You need to make choices.

就此题而言,choice是针对于选择唯一起点。就目前我的理解而言,每当一个潜在起点被否定后,如何选择下一个起点呢? 固然,根据我们对于规律的总结,我们知道该如何选择。 然而从Greedy的角度,是否也可以暗示同样的选择呢?

PYTHON CODE

def canCompleteCircuit(self, gas, cost):
        gas_left = 0
        start_station = 0
        i = -1
        if sum(gas) - sum(cost) < 0:
            return -1
        else:
            for g,c in zip(gas,cost):
                i+=1
                gas_left += (g-c)
                if gas_left < 0:
                    start_station = i+1
                    gas_left = 0
                    
        return start_station