The paper will present a compression program algorithm that will compress sequential strings of plant DNA and RNA for storage and transmission of plant genetic information. The need for compression of plant genetic data will be examined in both a theoretical manner and a practical manner in regards to large data pools of plant genetics information, Big Data, and genetic code space saving techniques for applied plant genetics.
The paper will present an algorithm program that can compress random and non-random sequential strings that can be applied to plant genetics. Both plant DNA and RNA can be compressed from the original structures genomic length. This has a direct application for storage and transmission of large data pools of agriculturally important plant stock genetics information. The algorithm used for the compression and de-compression of plant genetic information was discovered by the author in 1998 and is the most accurate and precise measure of randomness known .
The algorithm compression program uses the traditional left to right input of a segment of a sequential string of characters, in this case individual genetic DNA or RNA molecules, that is then sub-grouped into like natured characters, that can be compressed into a complete compression, a universal compression, or a ‘specific’ or ‘partial’ compression of the compressed sequential string .
In the compression of like natured genetic material on the original sequential string of a plant’s genetic code the resulting plant’s genetic code is reduced, compressed, without either the type or placement of that genetic information being lost. This has plant research and development applications to plant genetics as it allows for space saving techniques to theoretical genetics and practical applications to applied genetics research.
A Compression Algorithm: Some Examples
If a sequential string of binary characters represent a digital representation of a plants genetic code, a theoretical model of that code, representing a translation from the plants original analog genetic code sequence; an alpha symbol system, can be ‘compressed’ for storage and transmission purposes within a digital and computer communications network.
Example A: The following binary sequential string is a composite of a random sequential binary string.
Example A: 
If the linear sequential string of  and  of Example A is separated into sequentially common sub-groups the following will result:
Sub-group of Example A:  + ++++++++
The non-random features of the sub-groups are those that have a ‘pattern’ to the sequence of  or  characters such as these sub-groups:
Non-random sub-groups: +++
Each is a three character sub-grouping of either  or  and can be compressed as 0101 and notated as a sub-group of the initial character, either a  or a , of each sub-group composed of the same 3 characters total.
The remaining sub-groups are not as patterned’ as the non-random sub-group and are referred to as random sub-group sequences of a sequential binary string.
The remaining random sub-group sequences are as follows:
Random sub-group: ++…++
The random sub-group can be compressed by notating the number of total like natured digits, either  or , with a suffix number to denote the total number of characters following the initial digit.
 = [1x5]
 = [0x3]
 = 
 = [0x5]
 = 
The original sequential binary string of Example A was as follows:
Example A: 1111100010001110001110000010
The notated compressed form of Example A is as follows:
Notated Example A: 1x5 0x3 1 0x3 1x3 0x3 1x3 0x5 1 0
The non-notated compressed state is as follows:
Non-notated Example A: 1010101010
A compressed state of 10 characters from the original 28 character length.
Because digital storage and transmission are used in large data set collections, the use of the traditional binary format of  and  are used to transcribe the analog world into the digital world of computing. The vast amounts of plant data makes the need for ‘interpreting’ that data into a comprehensible whole a growing need in the biological sciences . Due to the rich diversity of both natural and engineered plants, the practical problems of gaining insight into all this plant genetic data to form some type of plant genetics ‘information’, information being the resulting product of ‘work’ obtained from the scholarly use of the intellectually ‘neutral’ plant genetic data, let alone the storage and transmission of such amounts of genetic data are overwhelming at best  & .
- Quote paper
- Professor Bradley Tice (Author), 2014, Algorithmic Complexity and Plant Genetics, Munich, GRIN Verlag, https://www.grin.com/document/268096