Skip to main content

File organization


Introduction to file system


File


File is a collection of records which are logically related to any object.
Record value can in any form like data. 
For eg. : Each students records which having values of Roll no, Name, Class.

For arranging data we use file.
For eg.: files of bank‟s customer, files of department, files of stack records etc. 
Files are recorded on secondary storage such as magnetic disks, magnetic tables and optical disks.

Types of file 


Physical file


Physical file concern with actual data that is stored. 
It stores description about how the data is to be represented. 

Logical file


Logical file: do not contain data. 
They contain a description of records that are found in one or more physical files. 
A logical file is a view or representation of one or more physical files. 

Special character file


At the time of file creation we insert some
special characters in file. For eg: Control + z for end of a file which having ASCII value 26
  

According to records types of files


  • Fixed length record file 
  • Variable length record file 

Fixed length record file


Every record in this file has same size(in bytes).
Record having value set, in the fixed length record file, memory block are assign in same size.

For eg., if the size for a record is assigned 30 bytes to each then records in this type are stored like as below, 
   

Advantage


records are stored in fixed distance of memory block, so fast searching for a particular record is done.

Disadvantage


Memory blocks are unnecessarily used when record size is small as compared to assigned memory block.
This useless memory block increases size of file.
   

Variable length record file


Every record in this file has variable size (in bytes). Memory block are assign for a file records are in variable size. 
Different records in the file have different sizes. 
As per size of records value, memory blocks are used.

Advantage


Memory used efficiently for storing record.
Whatever exact size of record that much size of memory block occupies in memory in this kind of records. 
Because of less memory they can move, save or transfer from one location to other in fast manner. 

Disadvantage


Access for record is slower as compared to fixed length record file due to varying size of a record.
   

File organization


File organization refers to the logical relationships among various records that constitute the file, particularly with respect to the means of identification and access to any specific record. 
In short, storing the files in certain order is called file organization.

Types of file organization


Sequential file organization


Sequential file organization is easiest method.
In this method files are stored one after the other in a sequential manner. 

This method is also called as Pile or sorted file. This method is fast & efficient for huge amount of data. Sorted file is inefficient as it takes time & space for sorting records.

Sorted file method


Sorted File Method In this method, As the name itself suggest whenever a new record has to be inserted, it is always inserted in a sorted (ascending or descending) manner.

Sorting of records may be based on any primary key or any other key.


Insertion of new record


Let us assume that there is a preexisting sorted sequence of four records R1, R3, and so on upto R7 and R8.

 Suppose a new record R2 has to be inserted in the sequence, then it will be inserted at the end of the file and then it will sort the sequence .
 

Pros and Cons of Sequential File Organization


Pros


Fast and efficient method for huge amount of data.
Simple design.
Files can be easily stored in magnetic tapes i.e cheaper storage mechanism.


Cons


Time wastage as we cannot jump on a particular record that is required, but we have to move in a sequential manner which takes our time.
Sorted file method is inefficient as it takes time and space for sorting records.

Heap file organization


Heap File Organization works with data blocks.

In this method records are inserted at the end of the file, into the data blocks. No Sorting or Ordering is required in this method. If a data block is full, the new record is stored in some other block, Here the other data block need not be the very next data block, but it can be any block in the memory. 
 It is the responsibility of DBMS to store and manage the new records.

Insertion of new records


Suppose we have four records in the heap R1, R5, R6, R4 and R3 and suppose a new record R2 has to be inserted in the heap then, since the last data block i.e data block 3 is full it will be inserted in any of the data blocks selected by the DBMS, lets say data block 1.

If we want to search, delete or update data in heap file Organization the we will traverse the data from the beginning of the file till we get the requested record. Thus if the database is very huge, searching, deleting or updating the record will take a lot of time.

Pros and Cons of Heap File Organization


Pros


Fetching and retrieving records is faster than sequential record but only in case of small databases.

When there is a huge number of data needs to be loaded into the database at a time, then this method of file Organization is best suited.

Cons


Problem of unused memory blocks.
Inefficient for larger databases.

Hash File Organization


Hash File Organization uses the computation of hash function on some fields of the records. The hash function's output determines the location of disk block where the records are to be placed.

When a record has to be received using the hash key columns, then the address is generated, and the whole record is retrieved using that address. 

In the same way, when a new record has to be inserted, then the address is generated using the hash key and record is directly inserted. The same process is applied in the case of delete and update.

In this method, there is no effort for searching and sorting the entire file. In this method, each record will be stored randomly in the memory.

Indexed sequential access method-ISAM


ISAM method is an advanced sequential file organization.

 In this method, records are stored in the file using the primary key. An index value is generated for each primary key and mapped with the record. This index contains the address of the record in the file.
 
If any record has to be retrieved based on its index value, then the address of the data block is fetched and the record is retrieved from the memory.

Pros of ISAM


In this method, each record has the address of its data block, searching a record in a huge database is quick and easy.

This method supports range retrieval and partial retrieval of records. Since the index is based on the primary key values, we can retrieve the data for the given range of value. In the same way, the partial value can also be easily searched, i.e., the student name starting with 'JA' can be easily searched.

Cons of ISAM


This method requires extra space in the disk to store the index value.

When the new records are inserted, then these files have to be reconstructed to maintain the sequence.

When the record is deleted, then the space used by it needs to be released. Otherwise, the performance of the database will slow down.

Introduction to file organization


A file organization is method of arranging the records in file.

The file is stored on secondary storage device called file. A file can be accessed  or modified in different ways.  This is done to perform some basic operations on the records available in the file.

For example: sort the records in ascending order on employee's name. 
But if we want to sort salary in increasing order then sorting records by name is not a good file organization.
It should be sorted on salary.
Here we deals with logical and physical files and different types of file in organization techniques.

Logical and physical files


What is File


In any information system, we deal with data. 

This data has to be arranged in a proper way to accept, process and communicate operations and results. 

For arranging the data, we need files. 
A manual file stores all the information relating to a particular activity.

For example: inventory activities in an inventory file, payroll activities in a payroll  file and so on.

The basic unit of information for computer and manual files is a record.
Collections of related data items form a record.

For exam Each employee’s record  will contain data items such as Employee number,employee name,Basic Pay,Allowances, Deductions, Gross pay, Net pay.

A set of logically related records form or constitute a file 


File structure


To learn file structure, one must understand the hierarchy; the terms are explained below

Character or byte


A bit is the smallest unit of data representation (value of a bit may be 0 or 1).
Eight bits make a byte which can represent a character code or a special symbol in a character code.
1 character = 1 byte.

Data Item


One or more characters combined may form a data item.
It is used to describe an attribute of an object or entity.
For example: student_no, student_name age, etc. are data items.
A data item is also referred to as a field. 
However, there is a slight difference between data item and field.
A field is a physical space on a magnetic disc whereas a data item is the data stored in the field.

Record


The data items related to an object or entity are grouped into a record.

Record can also be defined as a set of logically related fields.

There are two types of records


  • Fixed length.
  • Variable length.

In a fixed length record, every occurrence of the record must have each of the fields present and a given field need to be the same length from record to record. 

This means each occurrence of a record in a file is the same or of a fixed length.

In Variable length record, every occurrence of a record need not have each of the fields present and a given field need not be the same length from record to record.

This means, each occurrence of a record in a file is not the same. 

File


File is a set of logically related records. Almost all information stored in a computer must be in a file. There are many different types of files: data files, text files,program files, directory files, and so on. 

Logical and physical files


Files can be viewed as logical files and physical files.
Logical file is a file, viewed in terms of what data items contains its record and what processing operations may be performed on the file. 

The user of the file will normally adopt such a view. Physical file is a file, viewed in terms of how the data is stored on a storage device and how the processing operations are made possible. 
The next figure shows In short, files can be considered to have a multilevel structure.

From previous figure we have seen that file consist of records, records consists of data items (fields).

Data items may contain elementary items. 
For example: If Date is a data item then its elementary items are month,date and year.
The physical files are stored in secondary storage devices. 

The operating system makes a connection between logical and physical files for the  application program.

Application programs read or write the bytes from physical files that are stored on  secondary storage like a disk. 

Fields and Record Structure in File


Data is usually stored in the form of records.
Each record consists of a collection of related data values or items, where each value is formed of one or more bytes and corresponds to a particular field of the record. 

Record usually describes entities and their attributes. 

For example, an EMPLOYEE record represents an employee entity and each field value in the record specifies some attribute of that employee such as NAME, BIRTH-DATE,SALARY, etc.

A collection of field names and their corresponding data types consists of record type and record format definition.

A data type associated with each field, specifies type of value field can taken.
The data type of a field is usually one of standard data type used in programming. 
These include numeric (integer characters (fixed- length or varying), Boolean , and sometimes specially coded data and time data type.

The number of bytes required for each data type An integer may require 4 bytes, long integer requires 8 byte, a real number 4 byte, a boolean 1 byte,a date 4 bytes ,and a fixed length string of k characters K bytes.
 

BLOB


In recent database applications, the need may arise for dat items that consist of large unstructured objects, which represents images,digitized video or  audio streams or free text. 

These are referred to as BLOBs ( Binary Large Objects).
Normally, a BLOB data item is stored separately from its record in a pool of disk blocks and a pointer to the BLOB is included in the record. 

There are four common methods to add fields into the file


  • Force the fields into a predictable length. 
  • Begin each field with a length indicator.
  • Place a delimiter at the end of each field to separate it from the next field.
  • Use a "keyword = value" expression to identify each field and its contents.

For example, in C Programming


Struct Person{
varchar last [10]; 
char first [10]; 
char addr [15]; 
char city [15];
int zip [6];
 };

In this example, each field is a character array that can store string value of some maximum size. 

This is fixed-size field structure where structure Person can be + 15 + 6) 56 bytes.
Another way to make it possible to count to the end of field length just ahead of the field as shown in figure.

If the fields are not too long, then it is possible to store length in a single byte at the start of each field.
The choice of delimiter is another way to separate the fields.

We can use white space characters (blank, newline, tab) as a delimiters because they provide clean separation between fields, which is shown in the Figure.

Figure shows the structure in which a field provides information about itself. Such a self-describing structures can be very useful tools for organizing files in many applications.

Record Types


We have already introduced two types of record.

A file is a collection of records. Mostly all records in a file are of the same record type.
The file is said to be made up of fixed-length records, if every record is equal in size (in bytes). 

The file is made up of variable-length records, if different records in the file do not match in terms of size (in bytes). 

Reasons for having variable length records in a file

 
The file records belong to one record type, but one or more of the fields may have multiple values for individual records, such a field is called a repeating field. A group of values for the repeating field is called a repeating group. 

The file records belong to one record type, but one or more of the fields are optional
The file contains records of heterogeneous record types. This will happen if related records of heterogeneous types are placed together on disk blocks. 

For example, the Sales_Report records of a particular Product may be placed following the Product's record.

In this section, we present simplified analysis of three basic file organizations: 
Files sorted on some field, files that are hashed on some fields and indexed file organization. 

Our objective is to emphasize the importance of choosing an appropriate file organization. 

Sequential Files


We can physically arrange the records of a file on disk based on the values of one of their fields - called the ordering field.

This leads to an ordered or sequential file. If the ordering field is also a key field of th file a field definitely to have a unique value in each record then the field is also called the ordering key for the file. an ordered file with NAME as the ordering
key field (assuming that employee has distinct names). 

Ordered records have some advantages over unordered files as follows.

An index is a data structure that organizes data records on disks to optimize certain file operations. 

An index allows us to efficiently search or retrieve all records.Using an index we can achieve fast search of data records.

For reading the records in order of the ordering field values becomes extremely efficient, since no sorting is required. 

Finding the next record from the current one in order of the ordering field usually requires no additional block accesses, because the next record is in same block as the current one (unless the current record is the last one in block).

For using a search condition based on the value of an ordering key field result faster access when the binary search technique is used.

Using index we can find the desired entry and then use these to obtain data records. A data entry with search key value k contains enough information to locate data records with search key value k.

In order to create and maintain index files, a computer creates  a data file and an index file. The data file contains the actual contents (data) of the record and index file contains the index entries. The one field in identifies a record uniquely. 

 In the following ways, the files are organized

 
The data file is stored in the order of the primary key values. 

The index file contains two fields

  • the key value 
  • the pointer to data record.

One record in the index file thus, consists of a key value and a pointer corresponding data record. 

The pointer points to the first entry within the range of data records

Advantages


Data can be accessed directly and quickly.
Data maintained centrally and it kept up-to-date.
Primary and secondary index can be used to search the data.

Disadvantages


If we want to insert new index values between any two existing values, then it becomes difficult. 
If index values become too high, then searching becomes slow.
The use of an index lowers the computer efficiency.
Hardware required for these systems is expensive as data is stored on disk. 
File is updated directly
Backup should be taken regularly. 

Hashed files


In hashed files, the record number itself becomes an equivalent of the key value or primary key. 

The term hash indicates splitting of a key into pieces. Hash file organization provides very fast access to records on certain search conditions. This is usually called a hash or direct file. 

The idea behind hashing is to provide a function h, called a hash function or randomizing function, i.e. applied to the hash field value of a record and yields the address of disk block in which the record is stored.
A search for the record within the block can be carried out in the main memory buffer.


Internal Hashing


For internal files, hashing is typically implemented through the use of an array of records. Suppose that the array index range is from 0 to M - 1 [then we have M slots whose addresses correspond to the array indexes.

We choose a hash function that transform between 0 and M-1.

One common hash function is h(k)=K mod M function, which returns the remainder of an integer hash field  value K after division by M,this value is then used for the record address. 

Non-integer hash field values can be transformed function is applied. 

For example

 N = Number of records in the file
 K = Set of keys that can uniquely identify all the records in file Hash function H(K) = K mod M
  If K is 9875, N is 58 and M is 99, then we have,
  H(K) = 9875 mod 99 = 74 
  H(K)=7 mod 2=1
  H(K)=5 mod 2=1


A collision occurs when the hash field value of a new record  that is being inserted  hashes to an address that already contain a different record.

In this situation, we must insert the new record in some other position since its hash  address is occupied. 

The process of finding another position is called collision resolut numerous methods for collision resolution, including the following

Open addressing


Proceeding from the filled position specified byaddress, the program checks the following positions in sequencevancant (empty) position is found.

Difference between File system and DBMS


File system 


  • File system is a software that manages and organizes
  • the files in a storage medium within a computer.
  • Redundant data can be present in a file system.
  • It doesn't provide backup and recovery of data if it is lost.
  • There is no efficient query processing in file system.
  • There is less data consistency in file system.
  • It is less complex as compared to DBMS.
  • File systems provide less security in comparison to DBMS.
  • It is less expensive than DBMS.

DBMS


  • DBMS is a software for managing the database
  • In DBMS there is no redundant data.
  • It provides backup and recovery of data even if it is lost.
  • Efficient query processing is there in DBMS.
  • There is more data consistency because of the process of normalization.
  • It has more complexity in handling as compared to file system.
  • DBMS has more security mechanisms as compared to file system.
  • It has a comparatively higher cost than a file system.

Comments

Trending⚡

Happy birthday Hardik Pandya | In C programming

  Happy birthday Hardik Pandya . Now you are  28 years old. Great achievement you have. Let's we want to talk more about Hardik pandya. He is great cricketer. Pandya is awesome. In this Blog Post we are going to wish pandya " Happy birthday using C program". Let's tune with us till end. Now we have to wish pandya, so we are going to use printf () function printing message to pandya as " Happy birthday Hardik pandya Now you are 28 years old". Hardik pandya was born on 11 October in 1993. Now we are going to declare a variable called as current_age = 2021 - 1993. It calculate current age Of Hardik pandya. See the "Happy birthday pandya" using c programming. If you liked this Blog Post then don't forget to share with your computer science learning friends. Once again " Happy birthday Hardik Pandya sir". Read also Happy Rakshabandhan wish using C program Friendship day 2021 greetings in C

What is programming explained in simple words

Hi my dear friends today in this blog post we are going to know what programming is? In simple words I will explain to you programming. Nowadays we are watching real life use of programming. How computers learn to speak, talk and do the specified complex task for us. We are all keen to know what is exactly programming? Programming is the process of creating instructions that a computer can understand and execute. These instructions, also known as code, are written in a programming language that is specific to the task at hand. The history of programming can be traced back to the mid-20th century, with the development of the first electronic computers. The first programming languages were known as machine languages, which were specific to a particular type of computer. As computers became more sophisticated, high-level programming languages were developed, such as FORTRAN and COBOL, which were easier for humans to read and write. These languages allow programmers to write code t

check number is prime or odd or even using c program

Here is the c program to check if the user entered number is prime ,even and odd. These few lines of code solve three problems. In the above program we used integer type num variable for storing user entered numbers. Then we used the IF condition statement. That's all. IF condition for even number In the First IF statement we have a logic. If the number is divided by two then the reminder should be 0 then the number is an even number else not an even number. That simple logic is implemented in the first if statement. IF condition for odd number In the second IF statement we Implemented odd number logic. We identify odd numbers just by making little change in even number logic. If the number is divided by two then the reminder should not be a zero. Then the number is odd. That's simple logic used to identify whether a number is odd or not an odd number. IF condition for prime number In the third IF condition we implemented the logic of the prime number. In this IF

Graph Data Structure

Graph A graph can be defined as a group of vertices and edges that are used to connect these vertices. A graph can be seen as a cyclic tree, where the vertices (Nodes) maintain any complex relationship among them instead of having parent child relationship. A graph G can be defined as an ordered set G(V, E) where V(G) represents the set of vertices and E(G) represents the set of edges which are used to connect these vertices. A Graph G(V, E) with 5 vertices (A, B, C, D, E) and six edges ((A,B), (B,C), (C,E), (E,D), (D,B), (D,A)) is shown in the following figure. Directed and undirected graph Graph terminology Graph Representation Directed Graph Adjancency Matrix Graph Traversal Depth first search algorithm Directed and undirected graph A graph can be directed or undirected. However, in an undirected graph, edges are not associated with the directions with them. An undirected graph does not have any edges in directions. If an edge exists between ver

How to write programs in Bhai language

Bhai Language Bhai language is fun Programming language , with this language you can makes jokes in hindi. Bhai language written in typescript. It's very funny , easy and amazing language. Keywords of this language written in Hindi . Starting and ending of the program Start program with keyword " hi bhai " and end with " bye bhai ". It's compulsory to add this keyword before starting and end on the program. You write your programming logic inside this hi bhai and bye bhai . How to declare variables in Bhai language We use " bhai ye hai variable_name" keyword for declaring variables. In javascript we use var keyword for declaring variables but here you have to use " bhai ye hai " keyword. If you are declaring string then use " " double quotes. You can use Boolean variable like sahi and galat for true and false . How to print output in Bhai language You have to use " bol bhai " keyword for