Google
 
Site navigation: [ Home | Theory | Java | Moodle courses | Resource wiki | About ]

Direct access file organization

7.1.6
Direct access file organization

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

270450 could not be stored in location 0 so it was placed in the next available location.

 

 

 

 

Hashing

Hashing is a search method that is quite different from linear or binary search. In an open hash table the key values are stored in locations determined by a hash function.

For example, if the keys are integers and there are 10 spaces in the table we could use a function such as:

pos = key % 10; // % is the modulo function in Java

To determine at which position in the table it is stored. Consider the following table:

element

Hash Table

0

 

1

 

2

 

3

 

4

 

5

 

6

 

7

 

8

 

9

 

If the key 140750 is to be stored it would be stored at location 140750 % 10 or 0. Similarly 061077 would be stored at location 7 and so on:

element

Hash Table

0

140750

1

2

3

4

5

6

7

061077

8

9

A problem can arise if the location in which the key should be inserted is already occupied. In this case the next available location is used. Therefore if we should wish to insert 270450 into the table we find that location 0 is already occupied (there is a collision). Therefore this key would be stored in location 1:

element

Hash Table

0

140750

1

270450

2

 

3

 

4

 

5

 

6

 

7

061077

8

 

9

030679

Fill in the table for the following keys: 261178, 121282, 090780, 010187. Note that the table can be treated as circular, thus if the end is reached (location 9) we wrap back around to location 0. As the table becomes fuller, the insertion (and recovery) process can degenerate quickly to almost linear search. 

In practical applications the problem of collisions leading to increased search time can be tackled by two methods. The first is to use a hashing function that gives as wide a spread of values as possible.

For example, you might square the keys and take the middle digit(s) as the insertion position. The hash function is entirely arbitrary.

In addition, you might vary the linear search interval, one in the above example, to be a different value, 3 say. Then, when there is a collision, you could look at positions 0, 3, 6, 9, 2, 5 etc. Of course, careful selection of this interval is needed, if the table is of size 10 and the interval is 2, only half the positions will ever be used.

related: [ Topic 7 home | previous: fully-indexed files ]

Modulo gives the remainder after integer division:

3 % 10 = 3

123 % 10 = 3

6 % 2 = 0

and so on.


 
The site is partly financed by advertising revenue, partly by online teaching activities and partly by donations. If you or your organisation feel these resouces have been useful to you, please consider a donation, $9.95 is suggested. Please report any issues with the site, such as broken links, via the feedback page, thanks.

Questions or problems related to this web site should be addressed to Richard Jones who asserts his right to be identified as the author and owner of these materials - unless otherwise indicated. Please feel free to use the material presented here and to create links to it for non-commercial purposes; an acknowledgement of the source is required by the Creative Commons licence. Use of materials from this site is conditional upon your having read the additional terms of use on the about page and the Creative Commons Licence. View privacy policy.

Creative Commons License


This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. © 2001 - 2009 Richard Jones, PO BOX 246, Cambridge, New Zealand;
This page was last modified: May 31, 2009