A mini Google search engine from scratch, including crawler, indexer, Google-style page rank and web interface; passed 85% of 20 random chosen test keywords, support PDF, image, bigram and trigram search
Yu-Ho takes responsibility for building crawler and web application. The crawler crawled 270K+ pages in 3 days using semi-distributed designed, featuring in good fault tolerance and high scalabilities such as tasks re-execution mechanism and dynamic clusters size.
Yu-Ho reduces the average waiting time from 9s to 4s, through most-recent-used database-search cache and batch-download.