 A+
I am creating a Boolean algebra simplifier in C# for a project. To simplify Boolean algebraic expressions, I am taking the following approach:
1)Simplify the NOTs over each variable and apply De Morgan's Law where applicable
2)Simplify the brackets, if there are any, within the expression
3)Expand any brackets within the expression that can be expanded
4)Simplify each term in the expression e.g. for an expression A+B•A, the B•A will be one term. The terms are split so that each term contains only one gate  AND, OR, XOR. Nots are applied to these variables and are represented in a list that corresponds to each variable's index in the array. E.g. Nots[0] contains the number of Nots over the first variable within the expression. At this point in my program, no variables are connected through a NOT gate.
5)Factorise where possible
6) If the expression cannot be factorised, it has been simplified. If it has been factorised then steps 2 onwards are repeated until the expression does not change when being put through the steps.
I have been unable to create a factorising subroutine which works for all/ most cases. I have created a subroutine to factorise and have placed it below. I have tried to make it so that it only expands a maximum of two brackets and that the brackets do not have brackets in to make my subroutine easier to create. However, creating such an algorithm has proved to be quite tough for me.
If any one could provide some pseudocode, or an explanation on how to create such an algorithm, point out mistakes in my code or even provide some code which I could analyse and understand I would greatly appreciate it. The code is shown below: (Warning: It is horrible programming due to my lack of experience.)
private bool Factorise(ref List<string> Expression, ref List<int> NOTsNew) { string PreviousExpression = convertexpressionlisttostring(Expression); // loop and get each indiviual variable  no duplicates // loop through expression for every variable and see if it occurs more than once List<List<string>> itemsthatappearwithinexpression = new List<List<string>>(); List<string> charactersthatappearwithinexpression = new List<string>(); List<string> Notsofcharactersthathappearwithinexpression = new List<string>(); List<string> numberoftimescharacterappears = new List<string>(); List<string> positionofitemswithinexpression = new List<string>(); itemsthatappearwithinexpression.Add(charactersthatappearwithinexpression); itemsthatappearwithinexpression.Add(Notsofcharactersthathappearwithinexpression); itemsthatappearwithinexpression.Add(positionofitemswithinexpression); itemsthatappearwithinexpression.Add(numberoftimescharacterappears); for (int i = 0; i < Expression.Count; i++) { if (Expression[i] != "•" && Expression[i] != "+" && Expression[i] != "⊕") { if (itemsthatappearwithinexpression[0].Count == 0) { itemsthatappearwithinexpression[0].Add(Expression[i]); itemsthatappearwithinexpression[1].Add(NOTsNew[i].ToString()); itemsthatappearwithinexpression[2].Add(i.ToString()); } bool matched = false; for (int y = 0; y < itemsthatappearwithinexpression[0].Count; y++) { if (itemsthatappearwithinexpression[0][y] == Expression[i] && itemsthatappearwithinexpression[1][y] == NOTsNew[i].ToString()) { matched = true; break; } } if (!matched) { itemsthatappearwithinexpression[0].Add(Expression[i]); itemsthatappearwithinexpression[1].Add(NOTsNew[i].ToString()); itemsthatappearwithinexpression[2].Add(i.ToString()); } } } for (int x = 0; x < itemsthatappearwithinexpression[0].Count; x++) { int occurances = 1; for (int c = 0; c < Expression.Count; c++) { int position = int.Parse(itemsthatappearwithinexpression[2][x]); if (NOTsNew[c] == NOTsNew[position] && c != position && itemsthatappearwithinexpression[0][x] == Expression[c]) { occurances++; } } itemsthatappearwithinexpression[3].Add(occurances.ToString()); } for (int i = 0; i < itemsthatappearwithinexpression[0].Count; i++) { if (i < itemsthatappearwithinexpression[0].Count  1) { if (itemsthatappearwithinexpression[3][i] == itemsthatappearwithinexpression[3][i + 1] && int.Parse(itemsthatappearwithinexpression[2][i]) == (int.Parse(itemsthatappearwithinexpression[2][i + 1])  2)) { itemsthatappearwithinexpression[0][i] = itemsthatappearwithinexpression[0][i].ToString() + itemsthatappearwithinexpression[0][i + 1].ToString(); // chars, nots, position, occurances itemsthatappearwithinexpression[1][i] = itemsthatappearwithinexpression[1][i].ToString() + itemsthatappearwithinexpression[1][i + 1].ToString(); // Nots[0] itemsthatappearwithinexpression[0].RemoveAt(i + 1); itemsthatappearwithinexpression[1].RemoveAt(i + 1); itemsthatappearwithinexpression[2].RemoveAt(i + 1); itemsthatappearwithinexpression[3].RemoveAt(i + 1); } } } List<int> positionsoffirstcharinmatches = new List<int>(); string factorisedexpression = ""; bool donextthing = false; List<int> NOTsinfactorisation = new List<int>(); for (int d = 0; d < itemsthatappearwithinexpression[0].Count; d++) { int counter = 0; bool singularexpansion = false; if (itemsthatappearwithinexpression[0][d].Length == 1) { singularexpansion = true; } if (int.Parse(itemsthatappearwithinexpression[3][d]) > 1) { for (int i = 0; i < Expression.Count; i++) { bool Continue = false; if (singularexpansion && Expression[i] == itemsthatappearwithinexpression[0][d] && NOTsNew[i] == NOTsNew[int.Parse(itemsthatappearwithinexpression[2][d])]) { Continue = true; } if (i+2 <= Expression.Count1 && !singularexpansion && Expression[i] == itemsthatappearwithinexpression[0][d][0].ToString() && Expression[i+2] == itemsthatappearwithinexpression[0][d][1].ToString() && NOTsNew[i] == int.Parse(itemsthatappearwithinexpression[1][d][0].ToString()) && NOTsNew[i+2] == int.Parse(itemsthatappearwithinexpression[1][d][1].ToString())) { Continue = true; } donextthing = false; if (Continue) { if (i != 0) { if (Expression[i  1] == "•") { positionsoffirstcharinmatches.Add(i  2); if (counter == 0) { if (singularexpansion) { factorisedexpression += itemsthatappearwithinexpression[0][d] + "•(" + Expression[i  2] + Expression[i  3]; NOTsinfactorisation.Add(int.Parse(itemsthatappearwithinexpression[1][d])); NOTsinfactorisation.Add(0); NOTsinfactorisation.Add(0); NOTsinfactorisation.Add(NOTsNew[i  2]); NOTsinfactorisation.Add(0); counter++; } else { positionsoffirstcharinmatches.Add(i); factorisedexpression += itemsthatappearwithinexpression[0][d][0] + "•" + itemsthatappearwithinexpression[0][d][1] + "•(" + Expression[i  2] + Expression[i  3]; //string NOTsOfAdjacentVariables = itemsthatappearwithinexpression[1][d]; NOTsinfactorisation.Add(int.Parse(itemsthatappearwithinexpression[1][d][0].ToString())); NOTsinfactorisation.Add(0); NOTsinfactorisation.Add(int.Parse(itemsthatappearwithinexpression[1][d][1].ToString())); NOTsinfactorisation.Add(0); NOTsinfactorisation.Add(0); NOTsinfactorisation.Add(NOTsNew[i  2]); NOTsinfactorisation.Add(0); counter++; } } else { if (i >= Expression.Count  3) { factorisedexpression += Expression[i  2] + ")"; NOTsinfactorisation.Add(NOTsNew[i  2]); NOTsinfactorisation.Add(0); } else { factorisedexpression += Expression[i + 3] + Expression[i + 2]; NOTsinfactorisation.Add(0); NOTsinfactorisation.Add(NOTsNew[i + 2]); } } } else { donextthing = true; } } else { donextthing = true; } if (donextthing) { positionsoffirstcharinmatches.Add(i); if (counter == 0) { if (singularexpansion) { positionsoffirstcharinmatches.Add(i + 2); factorisedexpression += itemsthatappearwithinexpression[0][d] + "•(" + Expression[i + 2] + Expression[i + 3]; NOTsinfactorisation.Add(int.Parse(itemsthatappearwithinexpression[1][d])); NOTsinfactorisation.Add(0); NOTsinfactorisation.Add(0); NOTsinfactorisation.Add(NOTsNew[i + 2]); NOTsinfactorisation.Add(0); counter++; } else { bool useone = false; if (Expression[i]+Expression[i+2] == itemsthatappearwithinexpression[0][d]  Expression[i + 2] + Expression[i] == itemsthatappearwithinexpression[0][d]) { useone = true; } positionsoffirstcharinmatches.Add(i+2); if (useone) { factorisedexpression += itemsthatappearwithinexpression[0][d][0] + "•" + itemsthatappearwithinexpression[0][d][1] + "•(" + "1" + Expression[i + 3]; } else { factorisedexpression += itemsthatappearwithinexpression[0][d][0] + "•" + itemsthatappearwithinexpression[0][d][1] + "•(" + Expression[i + 2] + Expression[i + 3]; } //string NOTsOfAdjacentVariables = itemsthatappearwithinexpression[1][d]; NOTsinfactorisation.Add(int.Parse(itemsthatappearwithinexpression[1][d][0].ToString())); NOTsinfactorisation.Add(0); NOTsinfactorisation.Add(int.Parse(itemsthatappearwithinexpression[1][d][1].ToString())); NOTsinfactorisation.Add(0); NOTsinfactorisation.Add(0); if (useone) { NOTsinfactorisation.Add(0); } else { NOTsinfactorisation.Add(NOTsNew[i + 2]); } NOTsinfactorisation.Add(0); counter++; } } else { if (i == Expression.Count  3) { if (Expression[i]+Expression[i+2] == itemsthatappearwithinexpression[0][d]  Expression[i + 2] + Expression[i] == itemsthatappearwithinexpression[0][d]) { factorisedexpression += "1" + ")"; NOTsinfactorisation.Add(0); } else { factorisedexpression += Expression[i + 2] + ")"; NOTsinfactorisation.Add(NOTsNew[i + 2]); } NOTsinfactorisation.Add(0); } else { factorisedexpression += Expression[i + 3] + Expression[i + 2]; NOTsinfactorisation.Add(0); NOTsinfactorisation.Add(NOTsNew[i + 2]); } } } } } } else { } } // character • () > A•B + A•C Xor A•D = A•(B+C XOR D)  find every instance of the object  get the operator before the object and place the o //int n = 5; //Expression positionsoffirstcharinmatches = intbubblesorthightolow(positionsoffirstcharinmatches); List<int> PositionstoremovefromExpression = new List<int>(); for (int i = 0; i < positionsoffirstcharinmatches.Count; i++) { if (positionsoffirstcharinmatches[i] < Expression.Count  3) { PositionstoremovefromExpression.Add(positionsoffirstcharinmatches[i] + 3); PositionstoremovefromExpression.Add(positionsoffirstcharinmatches[i] + 2); PositionstoremovefromExpression.Add(positionsoffirstcharinmatches[i] + 1); PositionstoremovefromExpression.Add(positionsoffirstcharinmatches[i]); } else { PositionstoremovefromExpression.Add(positionsoffirstcharinmatches[i] + 2); PositionstoremovefromExpression.Add(positionsoffirstcharinmatches[i] + 1); PositionstoremovefromExpression.Add(positionsoffirstcharinmatches[i]); } } PositionstoremovefromExpression = intbubblesorthightolow(PositionstoremovefromExpression); PositionstoremovefromExpression = PositionstoremovefromExpression.Distinct().ToList(); for (int i = 0; i < PositionstoremovefromExpression.Count; i++) { NOTsNew.RemoveAt(PositionstoremovefromExpression[i]); Expression.RemoveAt(PositionstoremovefromExpression[i]); // A • B + C • A } for (int i = 0; i < factorisedexpression.Length; i++) { try { Expression[PositionstoremovefromExpression[PositionstoremovefromExpression.Count  1] + i] = factorisedexpression[i].ToString(); NOTsNew[PositionstoremovefromExpression[PositionstoremovefromExpression.Count  1] + i] = NOTsinfactorisation[i]; } catch (Exception) { Expression.Add(factorisedexpression[i].ToString()); NOTsNew.Add(NOTsinfactorisation[i]); } } if (PreviousExpression == convertexpressionlisttostring(Expression)) { return false; } else { return true; } }
The List expression is a string list which contains each character in my expression. For example, for A + B the list would be ["A","+","B"]. The NOTsNew list is the aforementioned list which contains the NOTs over each variable. I am allowed to use code in my project from others as long as I take the time to understand it, adapt it where necessary and mention this, so I am not cheating. P.S. Some of the code above could be placed into subroutines but I am currently trying to get some working code before I shorten it into individual subroutines.
You say:
Warning: It is horrible programming due to my lack of experience.
This is correct. You then say:
Some of the code above could be placed into subroutines but I am currently trying to get some working code before I shorten it into individual subroutines.
That is why your code is horrible. Break the code into subroutines first. Then write test cases for those subroutines, until you have 100% confidence that the subroutine is correct. You will then have a tool you can use to make a more complex routine.
But that is just code layout. Your fundamental problem is that you're writing an analyzer that works on the output of the lexer but you forgot to write the parser.
Here's the order in which things should happen:
 You have a string containing the expression:
"A+B•A"
say.  You write a lexer. A lexer takes in a string and produces a list of tokens.
What are the tokens? They are:
abstract class Token { ... } sealed class IdentifierToken : Token { ... } sealed class NotToken : Token { ... } sealed class OrToken : Token { ... } sealed class AndToken : Token { ... } sealed class LeftParenToken : Token { ... } sealed class RightParenToken : Token { ... } sealed class TrueToken : Token { ... } sealed class FalseToken : Token { ... }
So, task one is write this method:
public static List<Token> Lexer(string s) { ... }
Once the method is written, write extensive test cases for it. You need to make sure your lexer is completely solid.
 OK, we now have a list of tokens. Next problem is to parse them. Parsing takes a list of tokens and produces a single tree.
What are the nodes in the tree?
abstract class ParseNode { ... } sealed class OrNode : ParseNode { public ParseNode Left { get; } public ParseNode Right { get; } ... // Or maybe IEnumerable<ParseNode> Children { get; } // is easier; both techniques have their strengths. } sealed class AndNode : ParseNode { ... } sealed class NotNode : ParseNode { ... } sealed class IdentifierNode : ParseNode { ... } sealed class TrueNode : ParseNode { ... } sealed class FalseNode : ParseNode { ... }
Notice that there are no parentheses. The parentheses are expressed in the tree structure.
For example, if we have "(A+~B)*C"
then the lexer says LPAREN, IDENTIFIER(A), OR, NOT, IDENTIFIER(B), RPAREN, AND, IDENTIFIER(C)
. The parser then takes the list from the lexer and produces
And / / Or Id(C) / / Id(A) Not  Id(B)
So your next task is write this method:
public static ParseNode Parser(List<Token> tokens) { ... }
Again, write a ton of test cases. The parser needs to be perfect.
The hardest part of a parser for beginners is getting operator precedence correct. You need to make sure that A+B*C
parses as A+(B*C)
and not (A+B)*C
. It will help to write a formal, unambiguous contextfree grammar for the language you are parsing. For example just off the top of my head I think this unambiguously parses:
EXPR : OREX OREX : ANDEX ORTAIL ORTAIL : NIL ORTAIL : + ANDEX ORTAIL ANDEX : NOTEX ANDTAIL ANDTAIL : NIL ANDTAIL : * NOTEX ANDTAIL NOTEX : CONST NOTEX : ( EXPR ) NOTEX : ~ NOTEX NOTEX : IDENT IDENT : <any single letter> CONST : 1 CONST : 0
But don't take my word for it; write your own grammar and then write a recursive descent parser that parses it.
 Now we have a parse tree. This is the thing you manipulate to do your optimizations.
Some examples:
Not > True
can be replaced byFalse
, and vice versaNot > Not > anything
, can be replaced byanything
Or
withTrue
on either left or right can be replaced byTrue
Or
withFalse
on left can be replaced by whatever is on right.And
withFalse
on either side isFalse
and so on. You must represent each optimization as an operation on parse trees. Write each optimization in its own method.
Exercise: What is the factorize optimization on parse trees? This can be a surprisingly hard problem, so try to come up with a simplified version before you implement the full algorithm.
Advanced Exercise: Parse tree optimizers are an example of double dispatch, because there are two things that drive the action of the method: (1) what is the type of the node, and (2) what is the action of the optimizer. C# does not support double dispatch natively. Can you use the Visitor Pattern to elegantly implement this pattern without duplicating a lot of code?
So write a whole bunch of methods of this form:
public static ParseNode FooOptimization(ParseNode p) { ... } public static ParseNode BarOptimization(ParseNode p) { ... }
and run them all until the tree is fully optimized.
Again, write each optimization in its own method, and write test cases for each method until you are sure that each optimization is correct.
Get in good habits now. Extensive testing saves time. When I was a student, I saw fellow students up all night looking for bugs in their parsers. If they had written test cases early, they would not be up all night trying to get bugs out.
Let's look at an example:
public static ParseNode NotFalseOptimization(ParseNode p) { if (p is NotNode n) { // The child might itself have a Not(False) somewhere in it. ParseNode child = NotFalseOptimization(n.Child); if (child is FalseNode) return new TrueNode(); else return new NotNode(child); } else if (p is OrNode o) return new OrNode(NotFalseOptimization(o.Left), NotFalseOptimization(o.Right); else if (p is AndNode a) return new AndNode(NotFalseOptimization(a.Left), NotFalseOptimization(a.Right); else return p; }
Study that implementation. Make sure you understand how it works.
Exercise: Can you optimize my crappy implementation so that it does not allocate memory if the optimization finds nothing to change?
 Finally, you need to write a method that turns the optimized parse tree back into a string. Implement
ToString
onParseNode
.
Now your program is:
static string DoItAll(string s) { var tokens = Lex(s); var tree = Parse(tokens); var optimized = Optimize(tree); return optimized.ToString(); }
And you're done. Get busy, you've got a lot of work to do, but each step is doable.
Some additional advice:

Make the parse tree immutable. The optimizer does not rewrite the tree. The optimizer leaves the old tree alone, and produces a new tree as its output.

You have to be careful when optimizing Boolean algebra that every optimization does not undo the progress of any previous optimization. You can get into loops where one optimization expands a term, and then the next optimization factors it back out again, and then it expands again, and so on, forever.