Tuesday, August 20, 2019
Infinity Computer Systems Distributed Database
Infinity Computer Systems Distributed Database CHAPTER 1 Company Profile Company Profile: Infinity computer systems is a Sri Lanka based company engaged in selling computers, computer components and software applications to the local buyers. The company had pledged to provide the local market with the latest in products in IT for an affordable price as they appear in the world market, gaining an edge over its competitors. This well known secret has been the formula of success of the company to achieve rapid growth in a short time span. Company has wide range of computer hardware and software products on offer to the customers. One key market sector that the company is aiming to spread their reach in future is mobile handheld devices such as smart phones. Having started the business in 1999 with just two employees, today Infinity computer systems has grown into one of the biggest IT and computer components vendors in Sri lanka and in the South Asian subcontinent. Currently the company has 3 branches One in Mumbai, India and one in Kandy; a town in central part of Sri lanka and the head quarters situated in Colombo and employ 102 full time staff in all three branches. Infinity computer systems has a market share of about 30% in Sri lanka. Furthermore company has realize the benefits of the boom in IT sector in India and is aiming to expand the Mumbai branch to serve as a major computer hardware and software vendor in India to increase the revenue. Colombo head office and Mumbai branches maintains two large warehouses for storing directly imported products. Mumbai branch also directly engage with suppliers and manufacturers for buying stocks with minimal supervision form the Colombo head office. Kandy branch depends on Colombo head office for obtaining stocks and when dealing with major decisions. At Infinity computer systems theres a qualified sales and customer service team available to provide customers with expert product selection assistance and support. They try to keep an open dialogue with customers, so feedback and suggestions are always welcomed and highly appreciated. Be it a hard core gamer, student, small or medium sized business or an IT professional, Infinity Computer System has the right solution to cater every IT need. Current System: Due to the popularity of the company, Everyday a large number of business transactions are carried out at infinity computer systems outlets, resulting in heavy usage of databases and database applications. Stock maintenance, Human resource management as well as and sales and marketing departments all rely on database systems in day to day operations in Infinity computer systems all 3 branches. Currently Infinity computer systems utilize a centralized database system with relational database architecture to store data. The central database system is located in Colombo head office branch. Colombo, Mumbai and Kandy branches access the database in real time through the company WAN in day to day operations. The database system consists of following major tables which are related to other sub tables. Employees Stocks Sales Database manages separate tables for each branch that are logically related to above three main tables. The basic structure of the database table architecture is as follows. Both Mumbai branch and Kandy branch share Colombo central database. Disadvantages of Current System Due to the centralized nature of the current system, company faces number of difficulties and uncovered the following disadvantages. The main disadvantage of the current systems is single point of failure. If central database fails all the branches affect by it and all business activities comes to a halt. Furthermore breakdown in WAN line also affect the accessibility to the network. Slow access time is a major concern as well. Because, all three branches access the database simultaneously, current database systems has difficulties in processing queries quickly leading to frustration by many users. This affects negatively the fast phased nature of the infinity computer systems working environment as well as customer serving time. The sluggish nature of the current system is not appropriate in any mean to the upcoming expansions of the India branch that the company management is planning to execute in near future. specially, Mumbai branch will need a database which has quick access and has the ability to sustain a rapid growth in both capacity as well as demand. Areas Where Current System Lacks Security Current WAN network has a huge security hole as none of the site is protected by a firewall. This allows hackers and other malware programs such as worms to penetrate in to the network easily and it pose a great threat to the data at rest as well as those which are travelling on the network. The current system does not use any type of encryption when transferring data between remote sites and the main site. This pose a great threat for data such as Passwords and usernames as user authentication is done at the main site (Colombo) rather than the local sites. Lack of encryption means, anyone who intercept data get the access to user authentication information. Furthermore, the absence of encryption pose a threat to other data that transferred between main and remote sites as results for use queries. Because company database stores data that is vital and confidential to Infinity computer systems. If the data such as sales records and price listings fall in to rival business organizations hands, they can gain advantage over Infinity computer systems. User authentication system of the current database system has less than adequate authentication mechanism which grant access to all areas of the database system with single point of authentication. This pose a threat to data as staff of the company with all levels of position have the easy access to almost all of the company data. Solution To overcome the problems currently faced by Infinity Computer systems, a distributed database system can be implemented. In a distributed database environment, database is distributed over many locations where end users have quick access. Configuration and advantages of the new database system will be described in the next chapter. CHAPTER 2 Distributed Database Distributed database is a collection of multiple, logically interrelated databases distributed over a computer network. In a distributed database environment, users have the ability to access data from different sources that are located at multiple locations. When a database is distributed over many locations, it produces the challenge of retrieving the data from many locations and present it to system user. Furthermore managing the database becomes a critical function as well. This is where distributed database management system (DBMS) comes in to play. DBMS is a software system that manages the Distributed Database and provides the access mechanism to the users of the database. By tightly integrating with various systems and databases in a distributed environment, DBMS make the distribution transparent to the user. Infinity Computer Systems Distributed Database Overview Depending on the current geographical distribution of the branches and the WAN architecture, Infinity Computer Systems database can be distributed in to three branches to maximize the productivity and access times as well as obtaining many other advantages over existing centralized database. Out of many available, There are two major types of database design architectures we can consider when designing our company database system. They are, Multiple Site Processing, Multiple Site Data architecture using either Homogenous or Heterogeneous Distributed Database Management Systems(DDBMS). Out of above two architectures we will implement Homogenous Multiple Site Processing, Multiple Site Data architecture for Infinity Computer Systems distributed database. A distributed Multiple Site Processing, Multiple Site Data Multiple Site Processing, Multiple Site Data (MPMD) scenario refers to type of database that is fully distributed with multiple data processor support which includes transaction processors at multiple sites. When all the sites of the logically related distributed database utilize and integrates the same type of Database Management System(DBMS) at all sites of the distributed network its called Homogenous DBMS. Infinity Computer Systems Distributed Database Architecture Combining the concepts described above new distributed database architecture of the company can be demonstrated as below. As shown in the figure 4 and 5 in the new architecture, Both Mumbai and Kandy branches will maintain a copy of the database that consisting of records related to their respective branches. Colombo branch will maintain two databases. one includes data related to Colombo branch and a main database which will hold records related to all three branches. The new distributed database use relational database architecture. With this new architecture each branch will get the ability to access their very own database which is located locally. With this implementation, processing of the data also will be decentralized to all three branches. Apart from accessing the locally located database, both Kandy and Mumbai branches will be able to access the main database that is located in Colombo head office. The distributed database management system (DDBMS) will consist of Distributed Query Processor(DQP), that handles distributed queries, a Distributed Transaction Manager (DTM) for processing distributed transactions, a Distributed Metadata Manager (DMM) for managing distributed metadata, a Distributed Integrity Manger (DIM) for enforcing integrity among the various components of the distributed database system and a Distributed Security Manager (DSM) for enforcing security constraints across the database. LANs of all three branches were redesigned To facilitate the modifications to the new distributed database architecture, following section describes each LAN site with their new features and configurations. New LAN Architecture of Colombo Site Colombo branch function as the heart of the new distributed database system. As Colombo branch handles all management and financial decisions its important for Colombo branch to have access to data quickly. For this purpose, Colombo LAN is revamped as shown in the following figure to facilitate the functions of new distributed database. Being a company that is engaged in selling state of the art computers, accessories and networking products, its important to have faster access to database as well as fast access to different interconnected nodes within the LAN itself for this purpose Colombo site LAN is completely redesigned to facilitate the new database system. The old token ring based topology was replaced with a new Gigabit Ethernet LAN with Star topology. Gigabit Ethernet provides data rate of up to 1000 Mbp/s for LAN data. The new database server and backup server is implemented as a separated segment in the LAN. The separation is done through the router. The switches that connect Accounting and human resource department, Sales department and Warehouse department connects to a central switch, which in return connects to the router. Database server and Backup server connects to a switch, which in return connects to the router which also has the built-in firewall capability. The router segments the database section of the LAN form the other sections. This way it helps to reduce the congestion in the Local LAN. It helps faster access to database within the Colombo LAN itself as well as faster processing for incoming queries that are coming from other branches through WAN. The addition of the firewall protects the internal LAN form unauthorized access. This helps to protect the data of the company. The backup server provides continuous back up facility to the database. It helps to recover database in an event of a failure of the main database. New LAN Architecture of Mumbai Site Mumbai branch function as the second important branch after Colombo head office. Furthermore with managements intention to expand it in future to facilitate more storage and attain more sales targets that covers a larger customer base, makes it essential to have a good infrastructure in the LAN at Mumbai branch. For this purpose, Mumbai branch LAN also revamped as shown in the following figure to facilitate the functions of new distributed database and future additions. Mumbai branch LAN also revamped with an architecture similar to that of the Colombo LAN. The old network topology of Token Ring based architecture is replaced with a new STAR Topology Gigabit Ethernet. Gigabit Ethernet provide the faster access to data within the LAN, which is much need in day to day communication within the organization. Furthermore it lays the foundation to future expected expansions to the Branch. The router segments the LAN area consisting of distributed database from that of the other areas of the LAN. This helps to prevent congestion and improves the data transfer efficiency of the LAN as well as providing faster a access to data for both local and distributed queries. The router is equipped with a built in firewall which protects the internal LAN from unauthorized access, thus protects the valuable data of Infinity computer systems. Database server is connected to a backup server which backs up the data of the main database server. It helps to recover the main server in an event of a failure. New LAN Architecture of Kandy Site Kandy branch also revamped to facilitate the new changes to the database system. The architecture of the LAN is nearly similar to that of the other two branches. following figure shows the new architecture. The most notable addition to Kandy branch is the addition of the T1 line which replaced the previous ISDN line that connects the branch LAN to the company WAN. The T1 line provides the faster access to distributed data as well as internet. This makes it easier for all branches as it provides same speed of access to all three branches data without creating any bottle necks. The LAN is designed as a Gigabit LAN using a Star topology which provides fast data transmission within the LAN. The router has built-in firewall which protects the internal LAN form intrusions. The database server section of the LAN is segmented using the router. This helps to control the congestion and allows the faster access to data for local and distributed queries. Backup database server provide data backup functionality for main database server which helps in quick recovery of the main database server in an event of a failure. New Features of the Proposed WAN Network architecture and Distributed System There are few new features were introduced to the existing WAN network to make it compatible with Distributed database system and address certain security holes that presented in the existing WAN. Firewalls have been introduced to each local site to protect each LANs of all three branches. This feature address the issue of network being exposed to Worms and hackers threats. Firewalls block malicious traffic that are not authorized to enter any segment of the Infinity computer system network while allowing legitimate traffic to access any part of the network ISDN line that connected Kandy branch to the WAN has been replaced with a high bandwidth T1 line. This allows the distributed queries to access Kandy branch data at the same speed as the of the other two branches; Colombo and Kandy as well as Mumbai branch being benefitted from accessing the other two sites data much faster than the existing ISDN line. Each LAN of all three branches have been introduced with a new LAN segment which consists of distributed database and processing systems and it was segmented through the router to reduce the congestion so that both local and remote data traffic gets the ability to access the database faster. Security Enhancements Provided by New System The new system is designed to provide encryption for user authentication data. This prevents the data interceptors from understanding the data related to user authentication and authorization. The proposed system is designed with a multi level security control system. Multilevel security controls ensure users cleared at different security level access and share the companys distributed database in which data is assigned different security levels. This prevents the lower level staff from gaining access to data that are not relevant to them and it ensure the security of the data. Advantages of the New Database System There are number of advantages of the new distributed database system. The most significant advantage of the new system is, speed. with having locally available database for each branch there is no longer a need to connect to the Colombo head office database during day to day operations giving fast access to data that is located at each branch LAN. Fast access to database means rapid operations in all task in the company environment as well as quick service for customers. Having redundant data in two company branches apart from Colombo head office means higher availability. Even if a failure occur in the main branch database, it has the ability to quickly recover from the regional branches. In the same manner should a failure occur in a regional branch database, it has the ability to recover from main database in Colombo branch with minimal downtime. The distributed architecture of the database reduce the strain on the main database servers as workstations connected each branchs database server shares the processing workload between them. this result in faster processing of queries. There will be a tremendous reduction in network traffic as well. In the old configuration, company WAN was highly utilized for database traffic. specially Colombo branch received a huge amount of database traffic every day. with the new configuration, branch offices are no longer needed to utilize the WAN to access database. This frees up the WAN traffic and allow all branches to utilize it for other more critical tasks. As the database query processing is distributed between branches, there is no longer a need to maintain high cost high-end servers for processing. This reduces company expenditure in long run. New database system provides the ability to expand both in quantity and processing power. This provides the platform to carry out planned expansions for Mumbai branch without having to worry about recourses and infrastructure. Removal of Reliance on a Central Site. In the existing centralized system, both remote branches of the company as well as Colombo branch is depend on same database that is located in Colombo. But the proposed system eliminates this reliance and provide a independent database system for each branch with the ability of also connecting with the databases of the other branches through the Distributed database management system and eliminates single site of failures. With fragments as the unit of distribution in the new architecture, a transaction can be divided in to several sub queries that operates on fragments. This increases the degree of concurrency or parallelism in the system. Possible Problems in the New Architecture Complexity The new distributes database system hides the distributed nature of the system from the user. Though it provides an acceptable level of performance, reliability and availability is more complex than the existing centralized database architecture. The fact that certain data, specially Colombo and Mumbai stocks related data are replicated in Colombo and Mumbai branches, ads an extra level of complexity when dealing with synchronization between these elements. So the software must be designed to handle the data replication adequately, if not it will lead to degradation of availability, reliability and overall performance of the entire system. Cost The increased complexity of the new distributed database architecture leads to higher costs in hardware and software resources as well as maintenance costs. Difficulties in integrity control Validity and consistency of stored data is referred to as Database integrity. Integrity is usually is expressed in terms of constraints, which are consistency rules that DBMS is not allowed to violate. Enforcing integrity constraints requires that defines the constrains but that are not related to actual update operation itself. In a Distributed DBMS environment like the proposed Infinity computer systems architecture, the processing and communication cost that are required to enforce such integrity constraints may be prohibitive. Security Unlike the centralized DBMS system which the access can easily be controlled, new Distributed database system which consists of fragmented and replicated data which are located at multiple sites, the security control is more challenging. Furthermore the network itself needs to be made secure in order to protect the data that travel between three branches. CHAPTER 3 Detailed structure and functionality of distributed database In this chapter, structure of the distributed database of infinity computer systems and the functionality of the distributed components will be discussed in greater detail. Table Format There are three major tables used in the database architecture. They are Employees, sales and stocks. Following is the table format for each table. As shown in the figure 7, Employees, Sales and Stocks tables are fragmented and located at all three branches according to the relevance of the site where the data is accessed most often. This architecture make the data access time faster and keep the communication costs down. Furthermore data of the Stocks_CMB, are vertically fragmented and located at the Mumbai branch database site with the table name of Stocks_CMB_FRG. During the fragmentaion of Stock_CMB all attributes of the Stock_CMB table were allocated in to Stocks_CMB_FRG except UNITPRICE attribute, because it is irrelevant to INDIAN territory thus it reduce the wastage of storage space due to repetition of irrelevant data. The purpose of allocating Colombo head office Stock data in the Mumbai site is to allow faster access because Mumbai branch of Infinity computer systems, run its own warehouse and deal with manufacturers and suppliers directly. This makes it important fir Mumbai branch have the ability to access the Colombo stock data very often and quickly so that both branches can maintain a healthy stock for everyday business. Colombo branch retains a copy of the Stocks tables related to Mumbai site as well as maintaining its own stock table related to Colombo stocks. In this new design, Sto ck_MBI table that is located in the Colombo head office site is configured to synchronize with Stock_MBI table, two times a day, during midday and then at the end of the working day. In a nutshell, following is the way the database is distributed across three branches. Employees and Sales Tables that were previously located in the Colombo branch, were fragmented according to the relevancy where data items are physically belong and located at their relevant branches. Stock_MBI is replicated at Mumbai (Stock_MBI_LCL) site while retaining a exact copy in Colombo branch Stock_KDY table is transferred to Kandy site from its previous position of Colombo Stock_CMB table is Vertically fragmented and located a copy at Mumbai branch. Data Allocation Method There are four methods to consider when choosing a data allocation method for proposed distributed database architecture for Infinity computer systems. They are, Centralized Fragmented Complete Replication Selective Replication Out of the above methods, we use Selective Replication as the data allocation method for proposed distributed database architecture. Selective Replication is a combination of Fragmentation, replication and centralized data allocation methods. In this method some data items are fragmented to maximize high locality of reference and others, which are used at many sites and are not frequently updated, are replicated; otherwise data items are centralized. This approach gives combination of advantages of all the other three methods. Using the selective replication method, we will only be distributing data related to Kandy and Mumbai branch to their respective branches while keeping a main database at Colombo branch which will consist of records related to all branches. This will serve as a redundant database as well as a central repository where all data related to companys all three branches can be easily retrieved. Following section describes how the above distribution was done by using relational algebra. Fragmentation of Database for Allocation of Data at Various Sites When distributing a database across multiple sites, one of the main factors need to consider is the fragmentation of database items such as Tables. Fragmentation consists of breaking a relation in to smaller relations or fragments and storing the fragments at different sites. By fragmenting, data can be distributed to the sites where they used more often. There are two approaches to distribute database elements across multiple sites. They are, Distribute one copy of each database table in all sites Distribute portions of the selected tables that are important to local sites In our company case we will be using the second method mentioned above. In that method we will be distributing only the data in the three main tables that are related to each site (Branch). When fragmenting data in a table there are three techniques used. They are, Horizontal fragmentation Vertical fragmentation Hybrid fragmentation For our company distributed database we use both horizontal and vertical fragmentation to distribute table data among three branches. More precisely, For Employee table and Sales table we use Horizontal fragmentation and for Stocks_CMB table we use Vertical Fragmentation. The reason for using vertical fragmentation for stock table is because Mumbai branch deals with manufacturers and other vendors who provides hardware and software stocks directly to Mumbai branch. Horizontal Fragmentation In horizontal fragmentation, certain rows of the tables are put in to a base relation at one site, and other rows are put in to a base relation at another site. In other words, the rows (tuples) of a relation are distributed to many sites as disjointed fragments. In infinity computer systems database, we use the horizontal fragmentation as follows, to fragment Employees and sales tables. When selecting the criteria to horizontally fragment the Employees and Sales tables is the relevance of data to the location. As shown above, the current employee table, we fragment by considering the BRCODE field. BRCODE indicates the branch where the employee works. By doing so we can build three new tables out of the above database table and allocate them to each of the 3 branches of the company. We can horizontally fragment employees table in to 3 separate logically related tables as follows. Using relational algebra to do the horizontal fragmentation of Employees table To do the above horizontal fragmentation of employees table in to three tables we can use relational algebra SELECT operation. Our intention is to fragment the table in to three small fragments so each table would contain Employees related to their respective branch. To achieve this, Relational algebra operations are, Employees_CMB = SELECT(Employees_Table) WHERE BRCODE = CMB Employees_MBI = SELECT(Employees_Table) WHERE BRCODE = MBI Employees_KDY = SELECT(EMP_TABLE) WHERE BRCODE = KDY Executing the above three formulas results in following three table fragments: Employees_CMB (contains 2 tuples) Employees_MBI (contains 2 tuples) Employees_KDY (contains 1 tuple) Relational algebra operation for fragmenting Sales Table We can divide sales tables in to three fragments through the SELECT algebra operation as follows and it will result in three tables containing sales data related to each of the three branches. Sales_CMB = SELECT(Sales_Table) WHERE BRCODE = CMB Sales_MBI = SELECT(Sales _Table) WHERE BRCODE = MBI Sales_KDY = SELECT(Sales _Table) WHERE BRCODE = KDY Vertical Fragmentation Vertical Fragmentation works by splitting a table between attributes. Vertical fragmentation is used in situations where some sites needed to access the attributes of the tables of many data items in a table. This fragmentation is more difficult than horizontal fragmentation as more options exist. The fragmentation can be achieved by either Grouping attributes to fragments or Splitting relations in to fragments For fragment Colmbo branchs Stock_CMB Table, we use the first method mentioned above. Relational algebra operation for Vertical fragmentation of Sales_CMB Table For vertical fragmentation, relational algebra Project operation is used. We fragment the above table to form a new table called Stocks_CMB_FRG. This new table will contain all the attributes of the above table except UNITPRICE. Following is the relational algebra Project operation. Stocks_CMB_FRG = PROJECT(Stocks_CMB) TOCKCODE, ITEMNO, QTY, S LASTIN, NEXTIN Executing the above operation will result in creating the following table. Stocks_CMB_FRG Data Model The data model consists of three layers called schemas. Each schema defines a set of views that database can be seen. The three schemas are, External schema layer Represents the view of the database that users and/or applications might see Conceptual schema layer At this level the database objects such as tables, columns, views, and indexes are defined. These definitions provide mappings to the next level of the model, which is where the physical layout of the database is defined. Internal schema layer This layer defines the actual layout of the records and fields. Distributed databases of all three branches are modeled according the above structure and all three branches maintain their own set of the above model. In local sites, when users access the data stored locally, they access them as defined in the external Views. Conceptual schema maps the logical structure of the tables to Internal Schema which defines the physical storage of data on the discs. The abov Infinity Computer Systems Distributed Database Infinity Computer Systems Distributed Database CHAPTER 1 Company Profile Company Profile: Infinity computer systems is a Sri Lanka based company engaged in selling computers, computer components and software applications to the local buyers. The company had pledged to provide the local market with the latest in products in IT for an affordable price as they appear in the world market, gaining an edge over its competitors. This well known secret has been the formula of success of the company to achieve rapid growth in a short time span. Company has wide range of computer hardware and software products on offer to the customers. One key market sector that the company is aiming to spread their reach in future is mobile handheld devices such as smart phones. Having started the business in 1999 with just two employees, today Infinity computer systems has grown into one of the biggest IT and computer components vendors in Sri lanka and in the South Asian subcontinent. Currently the company has 3 branches One in Mumbai, India and one in Kandy; a town in central part of Sri lanka and the head quarters situated in Colombo and employ 102 full time staff in all three branches. Infinity computer systems has a market share of about 30% in Sri lanka. Furthermore company has realize the benefits of the boom in IT sector in India and is aiming to expand the Mumbai branch to serve as a major computer hardware and software vendor in India to increase the revenue. Colombo head office and Mumbai branches maintains two large warehouses for storing directly imported products. Mumbai branch also directly engage with suppliers and manufacturers for buying stocks with minimal supervision form the Colombo head office. Kandy branch depends on Colombo head office for obtaining stocks and when dealing with major decisions. At Infinity computer systems theres a qualified sales and customer service team available to provide customers with expert product selection assistance and support. They try to keep an open dialogue with customers, so feedback and suggestions are always welcomed and highly appreciated. Be it a hard core gamer, student, small or medium sized business or an IT professional, Infinity Computer System has the right solution to cater every IT need. Current System: Due to the popularity of the company, Everyday a large number of business transactions are carried out at infinity computer systems outlets, resulting in heavy usage of databases and database applications. Stock maintenance, Human resource management as well as and sales and marketing departments all rely on database systems in day to day operations in Infinity computer systems all 3 branches. Currently Infinity computer systems utilize a centralized database system with relational database architecture to store data. The central database system is located in Colombo head office branch. Colombo, Mumbai and Kandy branches access the database in real time through the company WAN in day to day operations. The database system consists of following major tables which are related to other sub tables. Employees Stocks Sales Database manages separate tables for each branch that are logically related to above three main tables. The basic structure of the database table architecture is as follows. Both Mumbai branch and Kandy branch share Colombo central database. Disadvantages of Current System Due to the centralized nature of the current system, company faces number of difficulties and uncovered the following disadvantages. The main disadvantage of the current systems is single point of failure. If central database fails all the branches affect by it and all business activities comes to a halt. Furthermore breakdown in WAN line also affect the accessibility to the network. Slow access time is a major concern as well. Because, all three branches access the database simultaneously, current database systems has difficulties in processing queries quickly leading to frustration by many users. This affects negatively the fast phased nature of the infinity computer systems working environment as well as customer serving time. The sluggish nature of the current system is not appropriate in any mean to the upcoming expansions of the India branch that the company management is planning to execute in near future. specially, Mumbai branch will need a database which has quick access and has the ability to sustain a rapid growth in both capacity as well as demand. Areas Where Current System Lacks Security Current WAN network has a huge security hole as none of the site is protected by a firewall. This allows hackers and other malware programs such as worms to penetrate in to the network easily and it pose a great threat to the data at rest as well as those which are travelling on the network. The current system does not use any type of encryption when transferring data between remote sites and the main site. This pose a great threat for data such as Passwords and usernames as user authentication is done at the main site (Colombo) rather than the local sites. Lack of encryption means, anyone who intercept data get the access to user authentication information. Furthermore, the absence of encryption pose a threat to other data that transferred between main and remote sites as results for use queries. Because company database stores data that is vital and confidential to Infinity computer systems. If the data such as sales records and price listings fall in to rival business organizations hands, they can gain advantage over Infinity computer systems. User authentication system of the current database system has less than adequate authentication mechanism which grant access to all areas of the database system with single point of authentication. This pose a threat to data as staff of the company with all levels of position have the easy access to almost all of the company data. Solution To overcome the problems currently faced by Infinity Computer systems, a distributed database system can be implemented. In a distributed database environment, database is distributed over many locations where end users have quick access. Configuration and advantages of the new database system will be described in the next chapter. CHAPTER 2 Distributed Database Distributed database is a collection of multiple, logically interrelated databases distributed over a computer network. In a distributed database environment, users have the ability to access data from different sources that are located at multiple locations. When a database is distributed over many locations, it produces the challenge of retrieving the data from many locations and present it to system user. Furthermore managing the database becomes a critical function as well. This is where distributed database management system (DBMS) comes in to play. DBMS is a software system that manages the Distributed Database and provides the access mechanism to the users of the database. By tightly integrating with various systems and databases in a distributed environment, DBMS make the distribution transparent to the user. Infinity Computer Systems Distributed Database Overview Depending on the current geographical distribution of the branches and the WAN architecture, Infinity Computer Systems database can be distributed in to three branches to maximize the productivity and access times as well as obtaining many other advantages over existing centralized database. Out of many available, There are two major types of database design architectures we can consider when designing our company database system. They are, Multiple Site Processing, Multiple Site Data architecture using either Homogenous or Heterogeneous Distributed Database Management Systems(DDBMS). Out of above two architectures we will implement Homogenous Multiple Site Processing, Multiple Site Data architecture for Infinity Computer Systems distributed database. A distributed Multiple Site Processing, Multiple Site Data Multiple Site Processing, Multiple Site Data (MPMD) scenario refers to type of database that is fully distributed with multiple data processor support which includes transaction processors at multiple sites. When all the sites of the logically related distributed database utilize and integrates the same type of Database Management System(DBMS) at all sites of the distributed network its called Homogenous DBMS. Infinity Computer Systems Distributed Database Architecture Combining the concepts described above new distributed database architecture of the company can be demonstrated as below. As shown in the figure 4 and 5 in the new architecture, Both Mumbai and Kandy branches will maintain a copy of the database that consisting of records related to their respective branches. Colombo branch will maintain two databases. one includes data related to Colombo branch and a main database which will hold records related to all three branches. The new distributed database use relational database architecture. With this new architecture each branch will get the ability to access their very own database which is located locally. With this implementation, processing of the data also will be decentralized to all three branches. Apart from accessing the locally located database, both Kandy and Mumbai branches will be able to access the main database that is located in Colombo head office. The distributed database management system (DDBMS) will consist of Distributed Query Processor(DQP), that handles distributed queries, a Distributed Transaction Manager (DTM) for processing distributed transactions, a Distributed Metadata Manager (DMM) for managing distributed metadata, a Distributed Integrity Manger (DIM) for enforcing integrity among the various components of the distributed database system and a Distributed Security Manager (DSM) for enforcing security constraints across the database. LANs of all three branches were redesigned To facilitate the modifications to the new distributed database architecture, following section describes each LAN site with their new features and configurations. New LAN Architecture of Colombo Site Colombo branch function as the heart of the new distributed database system. As Colombo branch handles all management and financial decisions its important for Colombo branch to have access to data quickly. For this purpose, Colombo LAN is revamped as shown in the following figure to facilitate the functions of new distributed database. Being a company that is engaged in selling state of the art computers, accessories and networking products, its important to have faster access to database as well as fast access to different interconnected nodes within the LAN itself for this purpose Colombo site LAN is completely redesigned to facilitate the new database system. The old token ring based topology was replaced with a new Gigabit Ethernet LAN with Star topology. Gigabit Ethernet provides data rate of up to 1000 Mbp/s for LAN data. The new database server and backup server is implemented as a separated segment in the LAN. The separation is done through the router. The switches that connect Accounting and human resource department, Sales department and Warehouse department connects to a central switch, which in return connects to the router. Database server and Backup server connects to a switch, which in return connects to the router which also has the built-in firewall capability. The router segments the database section of the LAN form the other sections. This way it helps to reduce the congestion in the Local LAN. It helps faster access to database within the Colombo LAN itself as well as faster processing for incoming queries that are coming from other branches through WAN. The addition of the firewall protects the internal LAN form unauthorized access. This helps to protect the data of the company. The backup server provides continuous back up facility to the database. It helps to recover database in an event of a failure of the main database. New LAN Architecture of Mumbai Site Mumbai branch function as the second important branch after Colombo head office. Furthermore with managements intention to expand it in future to facilitate more storage and attain more sales targets that covers a larger customer base, makes it essential to have a good infrastructure in the LAN at Mumbai branch. For this purpose, Mumbai branch LAN also revamped as shown in the following figure to facilitate the functions of new distributed database and future additions. Mumbai branch LAN also revamped with an architecture similar to that of the Colombo LAN. The old network topology of Token Ring based architecture is replaced with a new STAR Topology Gigabit Ethernet. Gigabit Ethernet provide the faster access to data within the LAN, which is much need in day to day communication within the organization. Furthermore it lays the foundation to future expected expansions to the Branch. The router segments the LAN area consisting of distributed database from that of the other areas of the LAN. This helps to prevent congestion and improves the data transfer efficiency of the LAN as well as providing faster a access to data for both local and distributed queries. The router is equipped with a built in firewall which protects the internal LAN from unauthorized access, thus protects the valuable data of Infinity computer systems. Database server is connected to a backup server which backs up the data of the main database server. It helps to recover the main server in an event of a failure. New LAN Architecture of Kandy Site Kandy branch also revamped to facilitate the new changes to the database system. The architecture of the LAN is nearly similar to that of the other two branches. following figure shows the new architecture. The most notable addition to Kandy branch is the addition of the T1 line which replaced the previous ISDN line that connects the branch LAN to the company WAN. The T1 line provides the faster access to distributed data as well as internet. This makes it easier for all branches as it provides same speed of access to all three branches data without creating any bottle necks. The LAN is designed as a Gigabit LAN using a Star topology which provides fast data transmission within the LAN. The router has built-in firewall which protects the internal LAN form intrusions. The database server section of the LAN is segmented using the router. This helps to control the congestion and allows the faster access to data for local and distributed queries. Backup database server provide data backup functionality for main database server which helps in quick recovery of the main database server in an event of a failure. New Features of the Proposed WAN Network architecture and Distributed System There are few new features were introduced to the existing WAN network to make it compatible with Distributed database system and address certain security holes that presented in the existing WAN. Firewalls have been introduced to each local site to protect each LANs of all three branches. This feature address the issue of network being exposed to Worms and hackers threats. Firewalls block malicious traffic that are not authorized to enter any segment of the Infinity computer system network while allowing legitimate traffic to access any part of the network ISDN line that connected Kandy branch to the WAN has been replaced with a high bandwidth T1 line. This allows the distributed queries to access Kandy branch data at the same speed as the of the other two branches; Colombo and Kandy as well as Mumbai branch being benefitted from accessing the other two sites data much faster than the existing ISDN line. Each LAN of all three branches have been introduced with a new LAN segment which consists of distributed database and processing systems and it was segmented through the router to reduce the congestion so that both local and remote data traffic gets the ability to access the database faster. Security Enhancements Provided by New System The new system is designed to provide encryption for user authentication data. This prevents the data interceptors from understanding the data related to user authentication and authorization. The proposed system is designed with a multi level security control system. Multilevel security controls ensure users cleared at different security level access and share the companys distributed database in which data is assigned different security levels. This prevents the lower level staff from gaining access to data that are not relevant to them and it ensure the security of the data. Advantages of the New Database System There are number of advantages of the new distributed database system. The most significant advantage of the new system is, speed. with having locally available database for each branch there is no longer a need to connect to the Colombo head office database during day to day operations giving fast access to data that is located at each branch LAN. Fast access to database means rapid operations in all task in the company environment as well as quick service for customers. Having redundant data in two company branches apart from Colombo head office means higher availability. Even if a failure occur in the main branch database, it has the ability to quickly recover from the regional branches. In the same manner should a failure occur in a regional branch database, it has the ability to recover from main database in Colombo branch with minimal downtime. The distributed architecture of the database reduce the strain on the main database servers as workstations connected each branchs database server shares the processing workload between them. this result in faster processing of queries. There will be a tremendous reduction in network traffic as well. In the old configuration, company WAN was highly utilized for database traffic. specially Colombo branch received a huge amount of database traffic every day. with the new configuration, branch offices are no longer needed to utilize the WAN to access database. This frees up the WAN traffic and allow all branches to utilize it for other more critical tasks. As the database query processing is distributed between branches, there is no longer a need to maintain high cost high-end servers for processing. This reduces company expenditure in long run. New database system provides the ability to expand both in quantity and processing power. This provides the platform to carry out planned expansions for Mumbai branch without having to worry about recourses and infrastructure. Removal of Reliance on a Central Site. In the existing centralized system, both remote branches of the company as well as Colombo branch is depend on same database that is located in Colombo. But the proposed system eliminates this reliance and provide a independent database system for each branch with the ability of also connecting with the databases of the other branches through the Distributed database management system and eliminates single site of failures. With fragments as the unit of distribution in the new architecture, a transaction can be divided in to several sub queries that operates on fragments. This increases the degree of concurrency or parallelism in the system. Possible Problems in the New Architecture Complexity The new distributes database system hides the distributed nature of the system from the user. Though it provides an acceptable level of performance, reliability and availability is more complex than the existing centralized database architecture. The fact that certain data, specially Colombo and Mumbai stocks related data are replicated in Colombo and Mumbai branches, ads an extra level of complexity when dealing with synchronization between these elements. So the software must be designed to handle the data replication adequately, if not it will lead to degradation of availability, reliability and overall performance of the entire system. Cost The increased complexity of the new distributed database architecture leads to higher costs in hardware and software resources as well as maintenance costs. Difficulties in integrity control Validity and consistency of stored data is referred to as Database integrity. Integrity is usually is expressed in terms of constraints, which are consistency rules that DBMS is not allowed to violate. Enforcing integrity constraints requires that defines the constrains but that are not related to actual update operation itself. In a Distributed DBMS environment like the proposed Infinity computer systems architecture, the processing and communication cost that are required to enforce such integrity constraints may be prohibitive. Security Unlike the centralized DBMS system which the access can easily be controlled, new Distributed database system which consists of fragmented and replicated data which are located at multiple sites, the security control is more challenging. Furthermore the network itself needs to be made secure in order to protect the data that travel between three branches. CHAPTER 3 Detailed structure and functionality of distributed database In this chapter, structure of the distributed database of infinity computer systems and the functionality of the distributed components will be discussed in greater detail. Table Format There are three major tables used in the database architecture. They are Employees, sales and stocks. Following is the table format for each table. As shown in the figure 7, Employees, Sales and Stocks tables are fragmented and located at all three branches according to the relevance of the site where the data is accessed most often. This architecture make the data access time faster and keep the communication costs down. Furthermore data of the Stocks_CMB, are vertically fragmented and located at the Mumbai branch database site with the table name of Stocks_CMB_FRG. During the fragmentaion of Stock_CMB all attributes of the Stock_CMB table were allocated in to Stocks_CMB_FRG except UNITPRICE attribute, because it is irrelevant to INDIAN territory thus it reduce the wastage of storage space due to repetition of irrelevant data. The purpose of allocating Colombo head office Stock data in the Mumbai site is to allow faster access because Mumbai branch of Infinity computer systems, run its own warehouse and deal with manufacturers and suppliers directly. This makes it important fir Mumbai branch have the ability to access the Colombo stock data very often and quickly so that both branches can maintain a healthy stock for everyday business. Colombo branch retains a copy of the Stocks tables related to Mumbai site as well as maintaining its own stock table related to Colombo stocks. In this new design, Sto ck_MBI table that is located in the Colombo head office site is configured to synchronize with Stock_MBI table, two times a day, during midday and then at the end of the working day. In a nutshell, following is the way the database is distributed across three branches. Employees and Sales Tables that were previously located in the Colombo branch, were fragmented according to the relevancy where data items are physically belong and located at their relevant branches. Stock_MBI is replicated at Mumbai (Stock_MBI_LCL) site while retaining a exact copy in Colombo branch Stock_KDY table is transferred to Kandy site from its previous position of Colombo Stock_CMB table is Vertically fragmented and located a copy at Mumbai branch. Data Allocation Method There are four methods to consider when choosing a data allocation method for proposed distributed database architecture for Infinity computer systems. They are, Centralized Fragmented Complete Replication Selective Replication Out of the above methods, we use Selective Replication as the data allocation method for proposed distributed database architecture. Selective Replication is a combination of Fragmentation, replication and centralized data allocation methods. In this method some data items are fragmented to maximize high locality of reference and others, which are used at many sites and are not frequently updated, are replicated; otherwise data items are centralized. This approach gives combination of advantages of all the other three methods. Using the selective replication method, we will only be distributing data related to Kandy and Mumbai branch to their respective branches while keeping a main database at Colombo branch which will consist of records related to all branches. This will serve as a redundant database as well as a central repository where all data related to companys all three branches can be easily retrieved. Following section describes how the above distribution was done by using relational algebra. Fragmentation of Database for Allocation of Data at Various Sites When distributing a database across multiple sites, one of the main factors need to consider is the fragmentation of database items such as Tables. Fragmentation consists of breaking a relation in to smaller relations or fragments and storing the fragments at different sites. By fragmenting, data can be distributed to the sites where they used more often. There are two approaches to distribute database elements across multiple sites. They are, Distribute one copy of each database table in all sites Distribute portions of the selected tables that are important to local sites In our company case we will be using the second method mentioned above. In that method we will be distributing only the data in the three main tables that are related to each site (Branch). When fragmenting data in a table there are three techniques used. They are, Horizontal fragmentation Vertical fragmentation Hybrid fragmentation For our company distributed database we use both horizontal and vertical fragmentation to distribute table data among three branches. More precisely, For Employee table and Sales table we use Horizontal fragmentation and for Stocks_CMB table we use Vertical Fragmentation. The reason for using vertical fragmentation for stock table is because Mumbai branch deals with manufacturers and other vendors who provides hardware and software stocks directly to Mumbai branch. Horizontal Fragmentation In horizontal fragmentation, certain rows of the tables are put in to a base relation at one site, and other rows are put in to a base relation at another site. In other words, the rows (tuples) of a relation are distributed to many sites as disjointed fragments. In infinity computer systems database, we use the horizontal fragmentation as follows, to fragment Employees and sales tables. When selecting the criteria to horizontally fragment the Employees and Sales tables is the relevance of data to the location. As shown above, the current employee table, we fragment by considering the BRCODE field. BRCODE indicates the branch where the employee works. By doing so we can build three new tables out of the above database table and allocate them to each of the 3 branches of the company. We can horizontally fragment employees table in to 3 separate logically related tables as follows. Using relational algebra to do the horizontal fragmentation of Employees table To do the above horizontal fragmentation of employees table in to three tables we can use relational algebra SELECT operation. Our intention is to fragment the table in to three small fragments so each table would contain Employees related to their respective branch. To achieve this, Relational algebra operations are, Employees_CMB = SELECT(Employees_Table) WHERE BRCODE = CMB Employees_MBI = SELECT(Employees_Table) WHERE BRCODE = MBI Employees_KDY = SELECT(EMP_TABLE) WHERE BRCODE = KDY Executing the above three formulas results in following three table fragments: Employees_CMB (contains 2 tuples) Employees_MBI (contains 2 tuples) Employees_KDY (contains 1 tuple) Relational algebra operation for fragmenting Sales Table We can divide sales tables in to three fragments through the SELECT algebra operation as follows and it will result in three tables containing sales data related to each of the three branches. Sales_CMB = SELECT(Sales_Table) WHERE BRCODE = CMB Sales_MBI = SELECT(Sales _Table) WHERE BRCODE = MBI Sales_KDY = SELECT(Sales _Table) WHERE BRCODE = KDY Vertical Fragmentation Vertical Fragmentation works by splitting a table between attributes. Vertical fragmentation is used in situations where some sites needed to access the attributes of the tables of many data items in a table. This fragmentation is more difficult than horizontal fragmentation as more options exist. The fragmentation can be achieved by either Grouping attributes to fragments or Splitting relations in to fragments For fragment Colmbo branchs Stock_CMB Table, we use the first method mentioned above. Relational algebra operation for Vertical fragmentation of Sales_CMB Table For vertical fragmentation, relational algebra Project operation is used. We fragment the above table to form a new table called Stocks_CMB_FRG. This new table will contain all the attributes of the above table except UNITPRICE. Following is the relational algebra Project operation. Stocks_CMB_FRG = PROJECT(Stocks_CMB) TOCKCODE, ITEMNO, QTY, S LASTIN, NEXTIN Executing the above operation will result in creating the following table. Stocks_CMB_FRG Data Model The data model consists of three layers called schemas. Each schema defines a set of views that database can be seen. The three schemas are, External schema layer Represents the view of the database that users and/or applications might see Conceptual schema layer At this level the database objects such as tables, columns, views, and indexes are defined. These definitions provide mappings to the next level of the model, which is where the physical layout of the database is defined. Internal schema layer This layer defines the actual layout of the records and fields. Distributed databases of all three branches are modeled according the above structure and all three branches maintain their own set of the above model. In local sites, when users access the data stored locally, they access them as defined in the external Views. Conceptual schema maps the logical structure of the tables to Internal Schema which defines the physical storage of data on the discs. The abov
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.