Mar 19

Index.dat or Reading your temporary internet files via code

I recently picked back up the amusing art of spidering, something I had played with some years ago but decided to revisit with the intention of improving my methods.  One thing that always bothered me about spidering, at least the way I did it, is downloading pictures for the pages you spider even though they may already be cached in the temporary internet files folder.

So I finally decided to brave the temporary internet files folder on my computer and write a little code to go with it.  Here is what I found.

First of all I found this site which saved me a ton of time learning about the index.dat file itself, which for those of you who don’t already know is the file that stores all the extra information and folder structure of the Content.IE5 folder.

Second I found that with a little effort turning that page into a .net library was not to difficult.  Which I have linked here for your amusement.

The basic procedure for using the aptly named TemporaryInternetFilesParser (I never claimed to be clever) is to include the libarary into your project then do

Dim tempInetFiles As New TemporaryInternetFilesParser()
Dim urlEntry As URLEntry = tempInetFiles.urlList("url of the image or file you are looking for in the cache")
Console.WriteLine(urlEntry.Folder)
Console.WriteLine(urlEntry.FileName)

Now you can hapilly use your Temporary Internet Files to your own purposes. One more thing, urlEntry.Folder is the folder reference inside the Content.IE5 folder, not an absolute path. You can find the Content.IE5 folder by doing

Dim tempInetFolder As String = Environment.GetFolderPath(Environment.SpecialFolder.InternetCache) + "\Content.IE5"

TemporaryInternetFilesParser
TemporaryInternetFilesParser Source

No comments

No Comments

Leave a comment