WEIGHT REDISTRIBUTION CODING
(Lossless Weight loss coding. Lose weight virtually without losing weight really.)
by
Sundaresh
The key to discovering this form of coding is to adopt a practical hands on touch and feel approach first which paves the way for a more rigorous theoretical approach later although the theoretical solution should be the best possible one.
Interestingly, the practical solution to this problem is a common geometric pattern and the theoretical solution can be a geometric progression so one can choose to call this form of coding as geometric coding.
This technique is merely an outgrowth of the following idea; if we are trying to code a longer bit string as a shorter bit string, then we need to find a way of redistributing the weights of the various positions of the longer bit string among the various positions of the shorter bit string. Ideally, some of the desirable properties we would like such a redistribution of weights to possess are,
Conservation of sum : The sum of the weights assigned to the bit positions of the shorter string must add up to the sum of the weights of the longer string.
Preservation of Order : For any given position in the shorter bit string, it's weight must be larger(or at least not less than) the sum of the weights of all the positions preceding it. If is the weight of the position of the shorter string and , then must be satisfied.
Optimality : The growth of these weights is fairly optimal. This means, as decreases, even though decreases yet, increases and does so fairly optimally.
Minimality : The least weight is not too large.
We find the following different triangular redistributionâ€™s of weights, which map a bit string of length to a bit string of length possess most of the above characteristics.
The weights of the successive positions of the longer bit string are clearly . For the sake of convenience we can use the subscript to denote the weight .
Let us begin by examining the various possible redistributionâ€™s for the simple case where . We begin with the vertical redistribution which is as follows,
14 
13 
11 
8 
4 

12 
10 
7 
3 


9 
6 
2 



5 
1 




0 
Bear in mind that the weights for the five individual positions of the shorter string are the sum of the weights of the five respective columns of the above triangle.
But this distribution is not too promising. So, next we try a horizontal redistribution which is from right to left as follows,
4 
3 
2 
1 
0 
8 
7 
6 
5 

11 
10 
9 


13 
12 



14 




This is certainly better, but we find that moving from left to right is the most ideal and the best as follows,
0 
1 
2 
3 
4 
5 
6 
7 
8 

9 
10 
11 


12 
13 



14 




which is also the same as the diagonal redistribution as follows,
14 
13 
11 
8 
4 
12 
10 
7 
3 

9 
6 
2 


5 
1 



0 




Incidentally, just like the above diagonal redistribution which is basically topright to bottomleft another similar but not exactly identical one which is basically in the reverse direction, bottomleft to topright which is equivalent to the horizontal right to left redistribution shown earlier can also be formed as shown below.
14 
12 
9 
5 
0 
13 
10 
6 
1 

11 
7 
2 


8 
3 



4 




Other possibilities conceivable are, increasing the vertical step which may not be all that helpful on it's own, or increasing the horizontal step which is reasonable, or optimally doing both.
For example the following is a horizontal left to right redistribution of the weights with a vertical step of two,
0 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 

14 
15 
16 
17 

18 
19 
20 


21 
22 
23 


24 
25 



26 
27 



28 




29 




Although this increases the compression ratio, it also increases the ratio considerably which is not so promising on it's own.
Note: For larger values of the vertical step, for which the diagonal redistribution will have to be performed in units of the vertical step, even though the horizontal and the diagonal redistributions will not be exactly the same they will still be fairly close. For instance the following is a diagonal topright to bottomleft redistribution with vertical step two which is fairly close to the horizontal left to right redistribution with vertical step of two shown above.
29 
27 
23 
17 
9 
28 
26 
22 
16 
8 
25 
21 
15 
7 

24 
20 
14 
6 

19 
13 
5 


18 
12 
4 


11 
3 



10 
2 



1 




0 




I make this point now because for the moment we choose to be interested only in the redistribution for which the vertical step can be more than one but the horizontal step is always one, since our initial purported solution only warrants this assumption but as we shall see later there is a drawback to taking such a narrow and a limiting view but finding the remedy for it proves to be a pleasantly interesting and instructional exercise.
Increasing the horizontal step performs better. Consider a left to right horizontal redistribution with horizontal step two.
We first choose alternative weights and form two triangles as below,
0 
2 
4 
6 
8 
10 
12 
14 
16 

18 
20 
22 


24 
26 



28 




and,
1 
3 
5 
7 
9 
11 
13 
15 
17 

19 
21 
23 


25 
27 



29 




Next, we merge the two triangles by laying the columns alternatively side by side as follows,
1 
0 
3 
2 
5 
4 
7 
6 
9 
8 
11 
10 
13 
12 
15 
14 
17 
16 


19 
18 
21 
20 
23 
22 




25 
24 
27 
26 






29 
28 








As is clearly evidenced, increasing the horizontal step decreases the compression ratio but also decreases the ratio. So, one could try striking a balance.
We have seen that there are two major ways of redistributing the weights in the form of a right triangle, vertical and horizontal(right to left as well as left to right), and each of these redistributionâ€™s can be conducted either by selecting the weights of the longer bit string consecutively or by selecting the weights of the longer bit string alternately, the latter which involves skipping over the weights by a number which is one less than the horizontal step.
The actual implementation is a simple two level one where the top level redistribution initially consists of selecting the weights alternately according to the horizontal step so as to form subtriangles at the lower level . Each subtriangle at the lower level is formed by redistributing the weights already selected for it at the top level with a vertical step of and a horizontal step of by selecting them consecutively, and finally back at the top level merging together all the subtriangles thus formed at the lower level, to obtain a single main triangular redistribution with a vertical step of and a horizontal step of .A symmetric design and implementation which involves merging the columns vertically so as to increase the vertical step instead of merging them horizontally so as to increase the horizontal step might in fact also be possible if not in fact also more favorable.
Herein I undertake a very rudimentary analysis of the practical method in order to derive some parameters relevant to its practical implementation. In general, if a bit string of length is coded as a bit string of length with horizontal step and vertical step respectively, then the relationship between and can be expressed as,
Clearly we would like to preserve the compression ratio. In other words we would like . Substituting for , we get
which upon simplification yields,
For the coding to be favorable and also as required by the above expression must hold good. But cannot be too large and and must be fairly close to each other because only then is there simultaneous maximization of the numerator and minimization of the denominator, since will be considerably(at least a factor) larger than . So the formula suggests will aways be true and this is the closest we can get even though the redistribution will not be the same but gets better with increasing values of . In the present implementation I decide to accept and to abide by this limitation, however towards the end of this analysis I sketchily aver to one possible way to surmount(pun intended) this problem which constitutes a fairly ideal solution for it.
Given the original bit string of arbitrary length , first and are determined as , from which is determined as the largest value which is also a multiple of such that . Now, only the leading bits of the original bit string are considered and treated as the actual new bit string for coding. As regards the weights of the trailing bits of the original bit string, they are correspondingly added right to left to the trailing bits of the actual new bit string.
Let denote the weight of the bit of the longer bit string. So clearly . There are subtriangles. In general if the weights of the subtriangle are,
where ,
then will be of the form
where and
For the vertical redistribution, if denotes the weight of the bit position of the subtriangle then we have,
where
,
corresponds to the exponent of the least weight in the column.
In like manner, we can determine for the horizontal redistribution as,
where
for the right to left redistribution and,
for the left to right redistribution
and
,
corresponds to the exponent of the weight in the column.
As we can see, only the appropriate bits have to be set to determine for all the different forms of right triangular redistribution.
As of now, in the implementation the coding is always done in multiples of fixed sized units of information of length not more than , and . For most practical purposes this should suffice. A larger information length will require an inordinate amount of computation resources in terms of processing time and memory space, which may not be all that practical and viable.
One way to overcome this stringent limitation is to have a variable ratio. If we divide the code into two equal parts, then need be true only for one of either the leading or the trailing half but can be true for the other half. I was inclined to think need be true only for the latter half, but if the ratio decreases with increasing the other way around might in fact be more favorable. Either way the objective should be to get as many as one can in the code without having to make too many repetitions. A group of successive columns which belong to a single horizontal step can be clubbed together and treated as one entity called a pillar or a section, and sections are numbered right to left the same way individual columns are. If the code comprises sections then the coding now maps
with the constant of proportionality as per my rough calculation, must be true in order to preserve the compression ratio. As we can guess the number of elements in each section will be similar to a normal distribution which resembles an actual mountain. So every mountain of a problem evidently surmounts itself and the bigger the are , the better they certainly are since the harder they also fall.
This concludes the basic analysis of the practical method.
Now, we attempt to adopt a more theoretical approach to coding a number in general .If is distributed among bits then we require the following conditions to be satisfied,
, assuming a geometric progression for the ratio's involving the successive weights.
We need to determine the optimal value of for given value's of and . Working upwards we have,
Similarly we need,
Likewise,
We gather that the factor is common for all and the other factor is a series involving and since , for the largest weight , this series can be approximated by it's most dominant or largest term. So, we can say .This means when we sum up all the there will be a minimal constant such that .Now by mere observation of the terms , we can see that taking the upper bound of to be will be safe and reasonably close to optimal. Besides, the inequality can be proven indirectly by Mathematical Induction. Hence, we need to determine such that,
need not be too small although it can be as it is in the case of the practical approach. A smaller value of will favor coding of smaller numbers.
However, this is a one time redistribution of the weights of the longer bit string alone. To achieve compression we need to repeat the process as follows. The algorithm is for coding a number in general.
Suppose has been redistributed among bit positions either theoretically or practically in a triangular fashion and a number has been coded as and the rightmost in that code occurs at bit position , we retain the bit code for the leading bits of , note , discard the trailing bits of the code and repeat the theoretical or practical triangular coding for the new value of which is now exactly for the corresponding new value of which is now where here of course is the complete code with the discarded trailing bits included.
Most likely will not be a convenient power of two, but will lie between two consecutive powers of two. The stratagem adopted is a basic and a simple one.
Let . Now if then is coded as it is, but if then is coded and either way the fact is noted along with the noting of .
As I stated earlier this is a very basic analysis. In general, if a bit string of length is coded as a bit string of length a better analytical approach would be to determine the successive weights of so as to minimize,
If a practical approach is adopted then the objective should be to determine and so as to minimize the above and if a theoretical approach is adopted the objective should be to determine and so as to minimize the same.
Further Improvements :
1)As it happens, aside from this readily discern able right triangular pattern, a redistribution in the form of an isosceles triangle is also possible as shown below and on first look does seem to be better than the right triangular pattern we have just seen. Remember that we are only interested in those redistributionâ€™s for which the sequence of values is monotonically progressive(increasing with decreasing value of ); for the isosceles triangular pattern the diagonal redistribution seems to be the only one which satisfies this property. As we already know the diagonal redistribution can be done in one of two ways topright to bottomleft or its reverse, bottomleft to topright. I only illustrate the one way of redistribution, topright to bottomleft. The other one is similar.
15 
14 
13 
11 
9 
6 
3 

12 
10 
8 
5 
2 



7 
4 
1 





0 



Evidently the vertical step can be increased,
17 
16 
14 
12 
9 
15 
13 
11 
8 
5 

10 
7 
4 


6 
3 
2 



1 




0 


as can the horizontal step,
31 
30 
29 
28 
26 
24 
22 
19 
16 
13 
9 
5 
2 
0 


27 
25 
23 
21 
18 
15 
12 
8 
4 
1 






20 
17 
14 
11 
7 
3 










10 
6 






Since the redistribution by consecutive selection is clearly possible, we can surmise that redistribution by alternate selection should also be possible.
Under the supposition that such a redistribution of weights in the form of an isosceles triangle with a horizontal step and a vertical step is always possible, the relationship of to for such a redistribution can be expressed as,
so there will be two possible redistributionâ€™s depending on whether is even or whether is odd. For instance, for the simplest case where and , one redistribution for one possible even value of would be,
19 
18 
17 
15 
13 
10 
7 
3 

16 
14 
12 
9 
6 
2 



11 
8 
5 
1 





4 
0 



2) This line of investigation can be extended along two principal directions,
Increasing the base : Reason tells us, that this form of coding can only be a special case of a more general form of coding for any arbitrary base and that increasing the value of the base will only increase the compression ratio. The binary identity offers us a clue and the following generalization of it would be a good initial place to start,
Increasing the order : This could be considered as a second order transformation or simply an transformation since it roughly maps .In general, an where the transformation is largely of a higher order is conceivable. Interestingly, the practical method for the second order transformation involves redistributing the weights of the longer string onto a two dimensional plane and then projecting that figure onto a one dimensional axis. One could extend this simile or parallel to the next order by redistributing the weights of the larger bit string onto a three dimensional space and then gradually projecting the sides of that figure onto one or all of the two dimensional planes first and then onto their respective one dimensional axis and finally combining the results, as necessary. In general we could always think of multidimensional space as space of space much the same way we think of multidimensional arrays as arrays of arrays, as typified for instance by the DIM statement used for declaring arrays in the popular BASIC programming language and its descendant versions, although the geometric implications of such a view cannot be ignored since beyond three dimensions, the orthogonality property of the axis will no longer be true.
Combining both of the above ideas.
3) Perhaps the simplest way to perform this kind of coding would be to integrate the theoretical into the practical. For instance, consider the simplest binary case of second order; for a given information length , we first optimally fix the corresponding code length , as also the ratio , and then select rows of weights with each row having weights of the information string moving from its right(trailing) to its left(leading) bits, all the while ensuring that both the optimality as well as the minimality criteria are met.
It is with this view to reasonably generalize the possible solutions to this problem and to incorporate all of those solutions into the program, hopefully in the near rather than in the distant future that I have tried to provide a fairly standardized set of possible command line options for the program even though this means quite a few of those options will necessarily be unavailable as yet.
Am still coding this program. The download link will appear here shortly.
An earlier version of this method hosted on another domain which also briefly describes another coding method can be seen here Uncommon sense
These are but two of an umpteen (at least over a dozen that I have been able to come up with so far)number of methods with any number of variants for all of them just like the variants for the above one, which yield seemingly unbelievable lossless compression, in fact way better by far than these two techniques. My general classification of coding is as meaningless or dumb coding, where coding for whatever purpose including archival, is undertaken without regard to the actual meaning of the data or the information being coded; as opposed to meaningful or intelligent coding where coding is carried out with regard to its actual meaning; what it represents as well as the nature and the characteristic of the information being coded including the purpose it is meant to serve. In addition, my investigations into coding will also cover the other two principal areas of coding, namely encryption (public key and nonpublic) as well as error detection and correction, although I have (had) done some fairly insightful preliminary work in both these areas. In the area of data structures and databases, as a result of my investigations I have obtained excellent results in both static searching including static trees , tries, etc and particularly in Hashing including Perfect Hashing , together with what may be called ordered hashing wherein the Hash function stores the keys in the table preserving their sorted order; storage and retrieval times for both forms of hashing, worst case is , as well as dynamic searching which includes completely balanced general trees, whose nodes are always completely full, and whose balance can be maintained either by shuffling the nodes or by rotating them, as opposed to Btrees whose nodes are always only half full. In the area of Operating Systems, I have designed process and thread scheduling algorithms; I have designed my own file system for disk and file management, my own Virtual memory management system with its own address mapping technique, which may be implemented either as hardware or as software, which uses a paging technique which is actually anticipatory paging as opposed to demand paging found in microprocessors, particularly the popular x86 family and similar. In the area of Networking I have designed a variety of different kinds of routing algorithms apart from the two obvious shortest path routing algorithms both of which perform an order better than Dijkstra's algorithm.
You can email me at sundaresh@mail.com or sundaresh@wereco.org