ill just give you the idea on how the process should be, atleast makat-on sab ka
Assuming you have database where you stored the url you want to crawl
Step 1) Download all the webpage and save it locally
Step 2) Once you download all, load the regular expression table and parse the downloaded file, normally this is the part that really takes time, most famous programming language have regular expression functions, everytime a you are done parsing, save it as a temporary textfile
Step 3) Once you are done all the parsing, you can start uploading it into your database, you can also add additional parsing prior uploading to the database but that just optional
accuracy of this steps is 60%-80%, you still have to proof read the resulted data since not all website are created the same and change of design may affect you regular expression table.
This is similar to Web Crawler used at innodata, except that they dont do it in level, they just combined the 3 steps in one process, bad very bad...
i made my first web crawler when im still at innodata, i used VBA from excel... it runs process by level!, avoiding the heavy processing of the application
i can only give you the idea on how it should be done since the procedure is pretty staight forward and most of the functions are readily available on the net all it needs is you finger on the google search field and your off togo...