Information integration- combining information from different sources into a unified format, poses one of the most complex and critical technology challenges to the future enterprise.
Huge volumes of transactional, customer, logistical and operational data are now being generated as organisations expand their web service networks, linking directly with suppliers, customers and partners. This must also be archived for record keeping and data mining.
The process involves a range of activities and tools including-
Schema standards- mapping and matching- oriented to generic kinds of data as well as particular application domains such as- cad, news, procurement, medical, government etc.
Data cleansing- duplication elimination and identity resolution
Information extraction- production of structured information from free-form text, using a set of annotators or extraction rules
Message mapping- integrating independently developed applications by moving messages between them
Object to relational mappers- allowing programs written in an OO language to access data in relational form
Document management- managing information held in text files, spreadsheets, slide shows as well as unstructured formats.
The overwhelming proportion of information in an enterprise is unstructured and cannot easily be stored in relational databases that provide the repositories for enterprise transactions and entity profile data.
This unstructured data exists in the form of web pages, slide shows, documents in proprietary formats, emails, paper based and video objects, as well as rules, regulations and informal but valuable employee intellectual capital or tacit knowledge.
Some documents such as product catalogues contain both unstructured text plus structured attributes.
Enterprises seek to enhance the accessibility and value of unstructured information by adding structure to it and storing it within an Intranet RDBMS. Also the increased adoption of the XML protocol is now commonly applied for document retrieval as well as the use of meta-data tags such as document, author, title etc to achieve a semi-structured and easily retrievable format.
Although there is still a clear difference between the operational performance standards of the enterprise Intranet and global Internet, many of the same techniques applied to the web are already being applied to intranet desktop information. These include spidering, clustering, taxonomy and ontology creation and adaptive refresh indexing, involving sophisticated change-detection mechanisms.
At the web level, knowledge must also be integrated across incompatible platforms, mainframe legacy systems and proprietary applications.
In order to support a more adaptive, decision-based architecture in the future, there will also be a requirement for an additional layer of intelligence to achieve a more seamless meshing of both structured and unstructured information, available on the fly.
Such a level of dynamic autonomy can only be achieved through further development of the Semantic Web 3.0- linked to search engines and based on the Resource Description Framework – RDF, providing a means of linking data from multiple websites or databases.
By using the SPARQL query language applications can extract RDF data from traditional databases.
Many current semantic web applications are now being deployed within industries to do enterprise data integration and related functions such as
automatic interpretation and retrieval of information, often using intelligent agents that act as mediators and facilitators in the process.
Intelligent information integration beyond simply integrating databases increases the value of the information accessed. The intelligent deployment of knowledge will also depend on invoking methods of valuing information as a function of the utility of the decisions involved. Such methods are currently available through Decision Engineering technologies.
Future Trends
High performance automated systems will be the norm in the future. Any system, particularly an adaptive one, which continues to depend on explicit human intervention in areas such as data integration, is destined to rapidly become uncompetitive.
It will also be necessary to continue to improve the core problems of handling inconsistent and incomplete data from different sources as well as tracking and verifying its derivation- that is, its provenance. Semantic technologies such as logic based reasoning engines will help automate these tasks.
Improved methods of information storage, using grid and cloud infrastructure, will also be an increasing requirement, driven by the need to access vast volumes of integrated data on the web- from searches, hosted applications, web services and open interfaces to web applications such as social networks and media resources.
The explosion of data on the web and its optimum local integration has emerged as both a new problem space and a game-changer for the future enterprise, that will require next generation technology to solve.
Huge volumes of transactional, customer, logistical and operational data are now being generated as organisations expand their web service networks, linking directly with suppliers, customers and partners. This must also be archived for record keeping and data mining.
The process involves a range of activities and tools including-
Schema standards- mapping and matching- oriented to generic kinds of data as well as particular application domains such as- cad, news, procurement, medical, government etc.
Data cleansing- duplication elimination and identity resolution
Information extraction- production of structured information from free-form text, using a set of annotators or extraction rules
Message mapping- integrating independently developed applications by moving messages between them
Object to relational mappers- allowing programs written in an OO language to access data in relational form
Document management- managing information held in text files, spreadsheets, slide shows as well as unstructured formats.
The overwhelming proportion of information in an enterprise is unstructured and cannot easily be stored in relational databases that provide the repositories for enterprise transactions and entity profile data.
This unstructured data exists in the form of web pages, slide shows, documents in proprietary formats, emails, paper based and video objects, as well as rules, regulations and informal but valuable employee intellectual capital or tacit knowledge.
Some documents such as product catalogues contain both unstructured text plus structured attributes.
Enterprises seek to enhance the accessibility and value of unstructured information by adding structure to it and storing it within an Intranet RDBMS. Also the increased adoption of the XML protocol is now commonly applied for document retrieval as well as the use of meta-data tags such as document, author, title etc to achieve a semi-structured and easily retrievable format.
Although there is still a clear difference between the operational performance standards of the enterprise Intranet and global Internet, many of the same techniques applied to the web are already being applied to intranet desktop information. These include spidering, clustering, taxonomy and ontology creation and adaptive refresh indexing, involving sophisticated change-detection mechanisms.
At the web level, knowledge must also be integrated across incompatible platforms, mainframe legacy systems and proprietary applications.
In order to support a more adaptive, decision-based architecture in the future, there will also be a requirement for an additional layer of intelligence to achieve a more seamless meshing of both structured and unstructured information, available on the fly.
Such a level of dynamic autonomy can only be achieved through further development of the Semantic Web 3.0- linked to search engines and based on the Resource Description Framework – RDF, providing a means of linking data from multiple websites or databases.
By using the SPARQL query language applications can extract RDF data from traditional databases.
Many current semantic web applications are now being deployed within industries to do enterprise data integration and related functions such as
automatic interpretation and retrieval of information, often using intelligent agents that act as mediators and facilitators in the process.
Intelligent information integration beyond simply integrating databases increases the value of the information accessed. The intelligent deployment of knowledge will also depend on invoking methods of valuing information as a function of the utility of the decisions involved. Such methods are currently available through Decision Engineering technologies.
Future Trends
High performance automated systems will be the norm in the future. Any system, particularly an adaptive one, which continues to depend on explicit human intervention in areas such as data integration, is destined to rapidly become uncompetitive.
It will also be necessary to continue to improve the core problems of handling inconsistent and incomplete data from different sources as well as tracking and verifying its derivation- that is, its provenance. Semantic technologies such as logic based reasoning engines will help automate these tasks.
Improved methods of information storage, using grid and cloud infrastructure, will also be an increasing requirement, driven by the need to access vast volumes of integrated data on the web- from searches, hosted applications, web services and open interfaces to web applications such as social networks and media resources.
The explosion of data on the web and its optimum local integration has emerged as both a new problem space and a game-changer for the future enterprise, that will require next generation technology to solve.
No comments:
Post a Comment