The main problem that this project faces is to solve the need of very high resources that is required to provide a successful web crawling. Most of the web crawlers used at the present date uses server farms to cater their needs. This makes the area untouchable for normal developers. My goal is to reduce the resources for web crawling by using a distributed system.
The distributed system will be used to do the web crawling and also the data processing. And a single database server to store the data. And also the project will provide the searching facility according to page details and images tags to provide a better image search.