Probably best to get the kids out of the room before you play this one.
Lots of heavy breathing by Brigette.
And you can read the backstory here. (And a bit more here, if you’re so inclined.)
But to business …
The Starting Point … and the FIRST Illustrative Text String
I had previously worked a text string example for the 1D cluster variation method (CVM) with the interactive code available. (Now that’s a cold-splash come-down after that steamy moment with Brigette, isn’t it?)
This is the PREDECESSOR YouTube, describing the text string and how it is a useful example. (No code links, though.)
And HERE’S THE BLOGPOST that went with that earlier YouTube.
Here’s the “Please like me on Facebook, thank you” example used for that illustration of the CVM method.
The Next Step: Simple Interactive Code (Different Example)
Then, over Christmas of this last year, I wanted to make a little “Christmas present” for all of you … a simple interactive version of the 1D CVM code.
That led to THIS YouTube:
This YouTube gave you code – and a worked example. The example was for my “1D CVM base pattern,” though – it was a pattern that was DESIGNED to hit the “equilibrium point” when the interaction enthalpy parameter h was set to 0.
This code was interactive – you could swap out two nodes and see how that changed both the configuration variable values as well as the resultant entropy.
Here’s what happened when I swapped the “on” node at (0,1) with the “off” node at (0,3). (The rows are numbered “0” (top) and “1“(bottom), the columns (even though every other one is staggered – that’s an artifact when constructing the CVM grid) start with “0” at the left and go through “11” at the right.)
We have a full set of 1D CVM RESOURCES, and we encourage those new to the cluster variation method to access and use these resources.
The Backstory – Why Brigette Bardot and “Je t’Aime”?
I was looking for a set of song lyrics that would be a slightly longer, yet natural follow-on to the previous worked text string example for the 1D cluster variation method (CVM) with the interactive code available for a simple, different example. (Now that’s a cold-splash come-down after that steamy moment with Brigette, isn’t it?)
No matter. Equations can be sexy.
Especially equations embedded into code; especially equations embedded into interactive code that is SO EASY TO RUN … all that you have to do is swipe the Python file, stick it into your favorite Python IDLE, and hit the “Run” button.
You could do that with the last example. (So why not, while listening to Brigette all over again? Or Jane Birkin. Depends on which version of the song you’re listening to. Both do a thoroughly convincing job of moaning their way to the top!)
So … we’ve had two examples so far:
- The first example – with NO code – was the politely banal “Please like me on Facebook thank you” example, and was chosen because it had (if I tweaked the letters right) an equal number of consonants and vowels.
- The second example – with INTERACTIVE code – was the case where we had not only equal numbers of “on” and “off” nodes, but equal distributions across the different kinds of configuration variables.
Equal numbers of consonants and vowels gave us an equal number of nodes in the “on” and “off” states, which was what we needed to experiment with a text-string example for the SIMPLEST 1D CVM, and we did that. And the point was … you could interchange any two “nodes” in that 1D CVM zigzag chain, and come up with a different entropy.
Just a toy, but a fun little toy, and a starting place.
Next Step – a Slightly More Complex 1D Text String
I thought it would be fun to compare a similar text string in both French and English. (French, because it is fairly vowel-rich, making it likely that we can find a text string with equal numbers of vowels and consonants. German, on the other hand, is a bit too consonant-heavy.)
The point of this whole exercise is that we want to identify the parameters (epsilon0, epsilon1) characterizing the (essentially same) text string in two different languages … just to see if there’s a difference.
And for our example, we are choosing to stay with text strings that have equal numbers of vowels and consonants (equal numbers of “off” and “on” nodes, and when we do illustrations, we’ll have the “on” nodes be black, and the “off” nodes be white).
The reason to have equal numbers of “on” and “off” is that when this is the case, one of our parameters (epsilon0) is zero, and we have an analytic solution.
Because we have an analytic solution, we can compare our computational results to the analytic prediction.
This is a useful thing, when we’re getting started.
At this point … it will REALLY HELP you if you read the two previous blogposts (see links above), and even look at the full 1D Cluster Variation Method (1D CVM) YouTube playlist. Three YouTube vids there right now. Just to get grounded.
So for our next text string, I looked for something along the lines of “Je t’aime” on our dear friend Google – and definitely found something.
The Data Set (Song Lyrics)
Here’s the lyrics – adapted for what I could use:
Lyrics (in French)
Lyrics (in English)
I had to make a few “command decisions” to get this to work out – bear in mind, this is not an effort to do a definitive analysis – it’s a demo-playtoy to illustrate the method. So, with that in mind:
- The letters “y” and “w” were typically counted as consonants, so was “h.” (There was one case where I needed to treat a “y” as a vowel.)
- I had to add a little “buffer” at the end of the English translation to make the numbers of letters in both versions come to the same value (188); otherwise the code would have been just a bit more complex than desired. So, I added in the word “between,” which was already in the text – and it had the right number of vowels (3) and consonants (4) to make the totals come out to the desired value of 94 in both cases.
- We have 94 vowels and 94 consonants for BOTH the French and the English versions; our “research question” is: Is there any difference in the “clustering” of the vowels-with-vowels and the consonants-with-consonants that is observed with just these two short text sequences?
A Little More Data Detail
In case you’re wanting to check my work (or construct something similar), here’s how the consonant-vowel breakdown works for encoding into the dataset.
You can access the MS PPTX(TM) with all the data details through our GitHub repository for this project.
The French Lyrics (Breakdown, Part 1)
The French Lyrics (Breakdown, Part 2)
(Still working on the breakdown for the English version of the lyrics.)
A Side Note on Representing the Data (“On” and “Off”)
EARLIER … in the previous example where we used “Please like me on Facebook thank you” as our text string, we had visually represented the consonants as black and the vowels as white.
That also meant that the consonants were encoded as “1’s” in the code, and the vowels were assigned to be “0” values.
For this example, we reversed the visual representation and the encoding:
- Consonants are encoded as “0’s” and are shown as white nodes (in the PPTX data visualizations), and
- Vowels are encoded as “1’s” and are shown as black nodes.
The reason is that even though we are still requiring that there be an equal number of “on” and “off” nodes in the system, we envision a possible future where we just might want to experiment with other data sets that have unequal numbers of consonants and vowels.
If we continue to work with the English language, we then have a language (data set) where there are typically more consonants than vowels. (Imagine encoding “The knave took fright at the sight of the knight.” How’s that for pulling from Anglo-Saxon origins?)
We think it is visually easier to see “islands” of black floating against a “sea” of white nodes.
Thus, we made the vowels to be the “on” or state “A” nodes, shown in black, so they would stand out more if we ever moved to experiments with more consonant-rich vocabularies.
And While We’re Having Fun …
And by the way – this is not just a trivial little exercise.
We are still pointing our nose in the direction of artificial general intelligence (AGI).
The thing is – the keystone to hold AGI together is that we need to let the signal-processing layer (whatever we have, it could be transformer-generated, lots of possibilities) interact with the ontology layer, and to do that, we’re going to use a CORTECON(R), and to have a decent CORTECON, we need a 2D cluster variation method engine.
We don’t have that yet in object-oriented form, and we DO need to shift the code from simple, standard Python to object-oriented Python.
Just a very simple little shift, but it’s giving us a chance for careful code walkthroughs, cleaning up variable names and other notation – you know, the typical pre-release stuff.
Currently, we have some object-oriented Python for the 1D CVM.
That’s what we’re using for these worked examples.
That’s what’s freely available in our GitHub – public 1D CVM Repository. (See the last 1D CVM Code blog for the link.)
Our NEXT STEP (right after this example) is to go to the 2D CVM.
Not a trivial exercise.
Might take a few weeks.
But in the meantime, EVERYONE who is interested in CORTECONs(R) for AGI needs to come up to speed on CORTECONs(R) in general, and the cluster variation method in specific.
So we all have work to do.
“Live free or die!,”* my friend – AJM
* “Live free or die. Death is not the worst of evils.” Attrib. U.S. Rev. War General John Stark
Resources and References
GitHub Repository
You can go directly to the GitHub repository associated with this project, and find:
- A MS PPTX(TM) document describing how the data was gathered and generated from the original song lyrics (in French), and then a secondary set of lyrics (translated into English) was created that closely aligned with the original French lyrics, and
- Code (object-oriented Python).
The Themesis GitHub repository is:
https://github.com/Themesis1/Simple-1D-CVM-Demos-Text-String
Data
Our GitHub Repository contains a MS PPTX(TM) slidedeck with the full details on the specific data items used to create this 1D CVM demo.
Two specific notes:
- We had to selectively cull some verses from the original lyrics in order to provide a data set where the numbers of vowels and consonants were equal, and
- The English translation of the song was seven characters shorter than the French translation, and so we “filled in the blank” with the word “between,” which had already been used in the English translation. Fortunately, this word also allowed us to achieve the equal number of vowels and consonants needed.
Papers
Primary Reference on the 1D CVM
- Maren, A.J. (2016). The Cluster Variation Method: A Primer for Neuroscientists. Brain Sci. 6(4), 44, https://doi.org/10.3390/brainsci6040044; online access, pdf; accessed 2018/09/19.
2-D Cluster Variation Method: The Earliest Works (Theory Only)
- Kikuchi, R. (1951). A theory of cooperative phenomena. Phys. Rev. 81, 988-1003, pdf, accessed 2018/09/17.
- Kikuchi, R., & Brush, S.G. (1967), “Improvement of the Cluster‐Variation Method,” J. Chem. Phys. 47, 195; online as: online – for purchase through American Inst. Physics. Costs $30.00 for non-members.
1 comment
Comments are closed.