Introduction to file system
File
File is a collection of records which are logically related to any object.
Record value can in any form like data.
For eg. : Each students records which having values of Roll no, Name, Class.
For arranging data we use file.
For eg.: files of bank‟s customer, files of department, files of stack records etc.
Files are recorded on secondary storage such as magnetic disks, magnetic tables and optical disks.
Types of file
Physical file
Physical file concern with actual data that is stored.
It stores description about how the data is to be represented.
Logical file
Logical file: do not contain data.
They contain a description of records that are found in one or more physical files.
A logical file is a view or representation of one or more physical files.
Special character file
At the time of file creation we insert some
special characters in file. For eg: Control + z for end of a file which having ASCII value 26
According to records types of files
- Fixed length record file
- Variable length record file
Fixed length record file
Every record in this file has same size(in bytes).
Record having value set, in the fixed length record file, memory block are assign in same size.
For eg., if the size for a record is assigned 30 bytes to each then records in this type are stored like as below,
Advantage
records are stored in fixed distance of memory block, so fast searching for a particular record is done.
Disadvantage
Memory blocks are unnecessarily used when record size is small as compared to assigned memory block.
This useless memory block increases size of file.
Variable length record file
Every record in this file has variable size (in bytes). Memory block are assign for a file records are in variable size.
Different records in the file have different sizes.
As per size of records value, memory blocks are used.
Advantage
Memory used efficiently for storing record.
Whatever exact size of record that much size of memory block occupies in memory in this kind of records.
Because of less memory they can move, save or transfer from one location to other in fast manner.
Disadvantage
Access for record is slower as compared to fixed length record file due to varying size of a record.
File organization
File organization refers to the logical relationships among various records that constitute the file, particularly with respect to the means of identification and access to any specific record.
In short, storing the files in certain order is called file organization.
Types of file organization
Sequential file organization
Sequential file organization is easiest method.
In this method files are stored one after the other in a sequential manner.
This method is also called as Pile or sorted file. This method is fast & efficient for huge amount of data. Sorted file is inefficient as it takes time & space for sorting records.
Sorted file method
Sorted File Method In this method, As the name itself suggest whenever a new record has to be inserted, it is always inserted in a sorted (ascending or descending) manner.
Sorting of records may be based on any primary key or any other key.
Insertion of new record
Let us assume that there is a preexisting sorted sequence of four records R1, R3, and so on upto R7 and R8.
Suppose a new record R2 has to be inserted in the sequence, then it will be inserted at the end of the file and then it will sort the sequence .
Pros and Cons of Sequential File Organization
Pros
Fast and efficient method for huge amount of data.
Simple design.
Files can be easily stored in magnetic tapes i.e cheaper storage mechanism.
Cons
Time wastage as we cannot jump on a particular record that is required, but we have to move in a sequential manner which takes our time.
Sorted file method is inefficient as it takes time and space for sorting records.
Heap file organization
Heap File Organization works with data blocks.
In this method records are inserted at the end of the file, into the data blocks. No Sorting or Ordering is required in this method. If a data block is full, the new record is stored in some other block, Here the other data block need not be the very next data block, but it can be any block in the memory.
It is the responsibility of DBMS to store and manage the new records.
Insertion of new records
Suppose we have four records in the heap R1, R5, R6, R4 and R3 and suppose a new record R2 has to be inserted in the heap then, since the last data block i.e data block 3 is full it will be inserted in any of the data blocks selected by the DBMS, lets say data block 1.
If we want to search, delete or update data in heap file Organization the we will traverse the data from the beginning of the file till we get the requested record. Thus if the database is very huge, searching, deleting or updating the record will take a lot of time.
Pros and Cons of Heap File Organization
Pros
Fetching and retrieving records is faster than sequential record but only in case of small databases.
When there is a huge number of data needs to be loaded into the database at a time, then this method of file Organization is best suited.
Cons
Problem of unused memory blocks.
Inefficient for larger databases.
Hash File Organization
Hash File Organization uses the computation of hash function on some fields of the records. The hash function's output determines the location of disk block where the records are to be placed.
When a record has to be received using the hash key columns, then the address is generated, and the whole record is retrieved using that address.
In the same way, when a new record has to be inserted, then the address is generated using the hash key and record is directly inserted. The same process is applied in the case of delete and update.
In this method, there is no effort for searching and sorting the entire file. In this method, each record will be stored randomly in the memory.
Indexed sequential access method-ISAM
ISAM method is an advanced sequential file organization.
In this method, records are stored in the file using the primary key. An index value is generated for each primary key and mapped with the record. This index contains the address of the record in the file.
If any record has to be retrieved based on its index value, then the address of the data block is fetched and the record is retrieved from the memory.
Pros of ISAM
In this method, each record has the address of its data block, searching a record in a huge database is quick and easy.
This method supports range retrieval and partial retrieval of records. Since the index is based on the primary key values, we can retrieve the data for the given range of value. In the same way, the partial value can also be easily searched, i.e., the student name starting with 'JA' can be easily searched.
Cons of ISAM
This method requires extra space in the disk to store the index value.
When the new records are inserted, then these files have to be reconstructed to maintain the sequence.
When the record is deleted, then the space used by it needs to be released. Otherwise, the performance of the database will slow down.
Introduction to file organization
A file organization is method of arranging the records in file.
The file is stored on secondary storage device called file. A file can be accessed or modified in different ways. This is done to perform some basic operations on the records available in the file.
For example: sort the records in ascending order on employee's name.
But if we want to sort salary in increasing order then sorting records by name is not a good file organization.
It should be sorted on salary.
Here we deals with logical and physical files and different types of file in organization techniques.
Logical and physical files
What is File
In any information system, we deal with data.
This data has to be arranged in a proper way to accept, process and communicate operations and results.
For arranging the data, we need files.
A manual file stores all the information relating to a particular activity.
For example: inventory activities in an inventory file, payroll activities in a payroll file and so on.
The basic unit of information for computer and manual files is a record.
Collections of related data items form a record.
For exam Each employee’s record will contain data items such as Employee number,employee name,Basic Pay,Allowances, Deductions, Gross pay, Net pay.
A set of logically related records form or constitute a file
File structure
To learn file structure, one must understand the hierarchy; the terms are explained below
Character or byte
A bit is the smallest unit of data representation (value of a bit may be 0 or 1).
Eight bits make a byte which can represent a character code or a special symbol in a character code.
1 character = 1 byte.
Data Item
One or more characters combined may form a data item.
It is used to describe an attribute of an object or entity.
For example: student_no, student_name age, etc. are data items.
A data item is also referred to as a field.
However, there is a slight difference between data item and field.
A field is a physical space on a magnetic disc whereas a data item is the data stored in the field.
Record
The data items related to an object or entity are grouped into a record.
Record can also be defined as a set of logically related fields.
There are two types of records
- Fixed length.
- Variable length.
In a fixed length record, every occurrence of the record must have each of the fields present and a given field need to be the same length from record to record.
This means each occurrence of a record in a file is the same or of a fixed length.
In Variable length record, every occurrence of a record need not have each of the fields present and a given field need not be the same length from record to record.
This means, each occurrence of a record in a file is not the same.
File
File is a set of logically related records. Almost all information stored in a computer must be in a file. There are many different types of files: data files, text files,program files, directory files, and so on.
Logical and physical files
Files can be viewed as logical files and physical files.
Logical file is a file, viewed in terms of what data items contains its record and what processing operations may be performed on the file.
The user of the file will normally adopt such a view. Physical file is a file, viewed in terms of how the data is stored on a storage device and how the processing operations are made possible.
The next figure shows In short, files can be considered to have a multilevel structure.
From previous figure we have seen that file consist of records, records consists of data items (fields).
Data items may contain elementary items.
For example: If Date is a data item then its elementary items are month,date and year.
The physical files are stored in secondary storage devices.
The operating system makes a connection between logical and physical files for the application program.
Application programs read or write the bytes from physical files that are stored on secondary storage like a disk.
Fields and Record Structure in File
Data is usually stored in the form of records.
Each record consists of a collection of related data values or items, where each value is formed of one or more bytes and corresponds to a particular field of the record.
Record usually describes entities and their attributes.
For example, an EMPLOYEE record represents an employee entity and each field value in the record specifies some attribute of that employee such as NAME, BIRTH-DATE,SALARY, etc.
A collection of field names and their corresponding data types consists of record type and record format definition.
A data type associated with each field, specifies type of value field can taken.
The data type of a field is usually one of standard data type used in programming.
These include numeric (integer characters (fixed- length or varying), Boolean , and sometimes specially coded data and time data type.
The number of bytes required for each data type An integer may require 4 bytes, long integer requires 8 byte, a real number 4 byte, a boolean 1 byte,a date 4 bytes ,and a fixed length string of k characters K bytes.
BLOB
In recent database applications, the need may arise for dat items that consist of large unstructured objects, which represents images,digitized video or audio streams or free text.
These are referred to as BLOBs ( Binary Large Objects).
Normally, a BLOB data item is stored separately from its record in a pool of disk blocks and a pointer to the BLOB is included in the record.
There are four common methods to add fields into the file
- Force the fields into a predictable length.
- Begin each field with a length indicator.
- Place a delimiter at the end of each field to separate it from the next field.
- Use a "keyword = value" expression to identify each field and its contents.
For example, in C Programming
Struct Person{
varchar last [10];
char first [10];
char addr [15];
char city [15];
int zip [6];
};
In this example, each field is a character array that can store string value of some maximum size.
This is fixed-size field structure where structure Person can be + 15 + 6) 56 bytes.
Another way to make it possible to count to the end of field length just ahead of the field as shown in figure.
If the fields are not too long, then it is possible to store length in a single byte at the start of each field.
The choice of delimiter is another way to separate the fields.
We can use white space characters (blank, newline, tab) as a delimiters because they provide clean separation between fields, which is shown in the Figure.
Figure shows the structure in which a field provides information about itself. Such a self-describing structures can be very useful tools for organizing files in many applications.
Record Types
We have already introduced two types of record.
A file is a collection of records. Mostly all records in a file are of the same record type.
The file is said to be made up of fixed-length records, if every record is equal in size (in bytes).
The file is made up of variable-length records, if different records in the file do not match in terms of size (in bytes).
Reasons for having variable length records in a file
The file records belong to one record type, but one or more of the fields may have multiple values for individual records, such a field is called a repeating field. A group of values for the repeating field is called a repeating group.
The file records belong to one record type, but one or more of the fields are optional
The file contains records of heterogeneous record types. This will happen if related records of heterogeneous types are placed together on disk blocks.
For example, the Sales_Report records of a particular Product may be placed following the Product's record.
In this section, we present simplified analysis of three basic file organizations:
Files sorted on some field, files that are hashed on some fields and indexed file organization.
Our objective is to emphasize the importance of choosing an appropriate file organization.
Sequential Files
We can physically arrange the records of a file on disk based on the values of one of their fields - called the ordering field.
This leads to an ordered or sequential file. If the ordering field is also a key field of th file a field definitely to have a unique value in each record then the field is also called the ordering key for the file. an ordered file with NAME as the ordering
key field (assuming that employee has distinct names).
Ordered records have some advantages over unordered files as follows.
An index is a data structure that organizes data records on disks to optimize certain file operations.
An index allows us to efficiently search or retrieve all records.Using an index we can achieve fast search of data records.
For reading the records in order of the ordering field values becomes extremely efficient, since no sorting is required.
Finding the next record from the current one in order of the ordering field usually requires no additional block accesses, because the next record is in same block as the current one (unless the current record is the last one in block).
For using a search condition based on the value of an ordering key field result faster access when the binary search technique is used.
Using index we can find the desired entry and then use these to obtain data records. A data entry with search key value k contains enough information to locate data records with search key value k.
In order to create and maintain index files, a computer creates a data file and an index file. The data file contains the actual contents (data) of the record and index file contains the index entries. The one field in identifies a record uniquely.
In the following ways, the files are organized
The data file is stored in the order of the primary key values.
The index file contains two fields
- the key value
- the pointer to data record.
One record in the index file thus, consists of a key value and a pointer corresponding data record.
The pointer points to the first entry within the range of data records
Advantages
Data can be accessed directly and quickly.
Data maintained centrally and it kept up-to-date.
Primary and secondary index can be used to search the data.
Disadvantages
If we want to insert new index values between any two existing values, then it becomes difficult.
If index values become too high, then searching becomes slow.
The use of an index lowers the computer efficiency.
Hardware required for these systems is expensive as data is stored on disk.
File is updated directly
Backup should be taken regularly.
Hashed files
In hashed files, the record number itself becomes an equivalent of the key value or primary key.
The term hash indicates splitting of a key into pieces. Hash file organization provides very fast access to records on certain search conditions. This is usually called a hash or direct file.
The idea behind hashing is to provide a function h, called a hash function or randomizing function, i.e. applied to the hash field value of a record and yields the address of disk block in which the record is stored.
A search for the record within the block can be carried out in the main memory buffer.
Internal Hashing
For internal files, hashing is typically implemented through the use of an array of records. Suppose that the array index range is from 0 to M - 1 [then we have M slots whose addresses correspond to the array indexes.
We choose a hash function that transform between 0 and M-1.
One common hash function is h(k)=K mod M function, which returns the remainder of an integer hash field value K after division by M,this value is then used for the record address.
Non-integer hash field values can be transformed function is applied.
For example
N = Number of records in the file
K = Set of keys that can uniquely identify all the records in file Hash function H(K) = K mod M
If K is 9875, N is 58 and M is 99, then we have,
H(K) = 9875 mod 99 = 74
H(K)=7 mod 2=1
H(K)=5 mod 2=1
A collision occurs when the hash field value of a new record that is being inserted hashes to an address that already contain a different record.
In this situation, we must insert the new record in some other position since its hash address is occupied.
The process of finding another position is called collision resolut numerous methods for collision resolution, including the following
Open addressing
Proceeding from the filled position specified byaddress, the program checks the following positions in sequencevancant (empty) position is found.
Difference between File system and DBMS
File system
- File system is a software that manages and organizes
- the files in a storage medium within a computer.
- Redundant data can be present in a file system.
- It doesn't provide backup and recovery of data if it is lost.
- There is no efficient query processing in file system.
- There is less data consistency in file system.
- It is less complex as compared to DBMS.
- File systems provide less security in comparison to DBMS.
- It is less expensive than DBMS.
DBMS
- DBMS is a software for managing the database
- In DBMS there is no redundant data.
- It provides backup and recovery of data even if it is lost.
- Efficient query processing is there in DBMS.
- There is more data consistency because of the process of normalization.
- It has more complexity in handling as compared to file system.
- DBMS has more security mechanisms as compared to file system.
- It has a comparatively higher cost than a file system.
Comments
Post a Comment
Please give us feedback through comments