Schrier's Sudelbücher
I like to learn things by building small projects. In addition to my scholarly research in chemistry I am generally interested in probability, geometry, optimization, programming topics, machine-learning, and electronics problems. For several years I have kept private electronic wastebooks of my explorations, but these are increasingly useful for my students. I’ve also increasingly enjoyed reading other people’s blogs in this vein, so this is my attempt to share some of the fruits of these considerations. Nearly all code written in Mathematica, as I find it not as clumsy or random as Python. An elegant tool, for a more civilized age. No attempt is made at novelty or scholarly completeness; these are just haphazard notes, as an exercise in deviant interdisciplinarity.
-
Reading numpy arrays in Mathematica
One can read relatively simple numpy arrays that contain only numeric datatypes (NPY version 1.0) in Mathematica using a function written by Luca Robbiano:
-
Quantum Music
I stumbled upon a strange rabbit hole: quantum music, wherein one uses a (simulated) quantum computer to sequence your tones. Music of the Bloch spheres, indeed. The Goethe Institut sponsored a festival on this in 2023 and there is a 2022 collected volume on the topic, and the CTM 2024 conference had a few talks on the topic but I haven’t heard anything too compelling yet.
-
Guitar Pedals
Going to the BK Synth and Pedal Expo with Tardio last weekend, reminded me about how neat they are. A few resources for DIY guitar pedal building…
-
Imaginary Cinema: Top Gun, but with Chiguiros
-
Musical Statistical Thermodynamics
While wasting time reading Phys Rev E, I came across a recent article by Jesse Berezovsky on Renormalization-group approach to ordered phases in music (2024), which in turn lead me to the author’s earlier work The structure of musical harmony as an ordered phase of sound: A statistical mechanics approach to music theory (2019). The core idea of the latter is to use a Sethares-style dissonance function as the internal energy and then minimize the Helmholtz free energy. This is plausible because it describes an optimization between pattern (low energy) and variety (entropy) in music. Bereezovsky’s website has example compositions generated by this approach. And poking around on his google scholar, he has a 2024 March Meeting abstract on rhythm…having not seen the talk, I would speculate about treating this as like an Ising model/step-sequencer with some interaction.
-
Llama plays the blues
I was comping some blues the other day with a shell voicing (major third and dominant 7th of the root). And I wondered: Can Llama3.1-70b identify the root?…
-
Assessing an LLM's confidence in its own explanations
Background: We can ask an LLM to generate scientific explanations, but why should we believe them? (Case study here is for inorganic synthesizability predictions like those in our recent JACS paper, but strategy should be general). Our goal here is to show a process by which we can dissect the reasons and try to determine the model’s confidence. Empirically, language models (mostly) know what they know, that is to say, pre-trained LLMs provide well-calibrated true/false self-evaluation on factual questions. We will thus use a prompt chaining approach, in which we first use the LLM to extract an underlying rule for a statement, and then use the LLM to assess the veracity of that rule….
-
3d-printed RC boats
Why fly, when you can rule the sea? I love a good maritime theme, and always thought the model sailboats in Central Park were cool (in fact, they even have their own yacht club that races Saturdays 10am - 1pm). Possible resources for 3d-printing radio-controlled boats…
-
Henrik Neugeboren's Bach Sculpture
Henrik Neugeboren (1901-1959) spent time at Bauhaus Dessau and made a sculpture of Bach’s Fugue in E flat minor. See article by Probst (2020) Pen, Paper, Steel: Visualizing Bach’s Polyphony at the Bauhaus. It is in Leverkusen (outside Köln). It would 3d print very nicely.
-
Imaginary Syllabi: Additive Manufacturing Lab
Premise: Move beyond mere autodidacticism and equip a comprehensive, but minimal teaching lab for a course on material science / additive manufacturing processes…
-
Intersecting cones
I wanted to make one of those intersecting cone fidget toys. So I followed the instructions to make one in Fusion360. Printed at 0.2 mm layer height (outer perimeters first for better dimensional tolerance), with 0.4mm face offset tolerance, and 0.3 mm chamfers worked just fine. No doubt you could print in place, but I made two separate prints so that they could be contrasting colors.
-
Detecting LLM confabulations with semantic entropy
Farquhar et al. recently described a method for Detecting hallucinations in large language models using semantic entropy (Nature 2014). The core idea is to use an LLM to cluster statements depending together if they have a bidirectional entailment (A entails B and B entails A must both be true). If the answers form only one cluster, then there is very little entropy and the LLM is confident; if there are many clusters each with a few examples, then the entropy is high and the LLM likely to be confabulating. Let’s write a minimal implementation from scratch with worked examples…
-
Imaginary Book: AI Agents with Plato
Premise: Learn how to develop agentic LLM systems by building characters in Platonic dialogues…
-
Eurorack case building
This spring I came to the conclusion that I needed to expand my Eurorack case. I had decided to buy one, but then again I don’t need to expand it that much, and so naturally I am going to yak shave by building a case…
-
Hacking the Korg Volka Beats
The Korg Volka Beats is a cool little portable drum machine, but how can you make it cooler by voiding your warranty?…
-
Glicol Beatz School
My name is DJ Borscht, I’m full of beatz… Basic rhythm patterns from Pocket Operations implemented in Glicol…
-
Hacking Meta Rayban Glasses
Confession: I bought a pair of Meta Rayban smart glasses. Built in functionality is cool, but of course, a hackers gotta hack. So how can you run other things on them?…
-
I (don't) got rhythm
(With apologies to the Gershwin brothers) In my quest to build a drum machine, don’t think I really understand musical rhythm and “beats” well enough. Some resources and random thoughts…
-
Charles Fort, Son of the Bronx
According to Wikipedia, Charles Fort lived on Ryer Ave in the Bronx (runs parallel-ish between Grand Concourse and Webster Ave, east of Arthur Ave) not far from the Fordham campus. (Inspired by reading a book review of Think to New Worlds: The Cultural History of Charles Fort and His Followers )–Side quest: put up a plaque?
-
Storm-trooper helmet tiki torch
Premise: Use cement casting to make a storm-trooper helmet tiki torch…
-
LLM Finetuning Notes
I’ve been doing some fine-tuning experiments lately, and while fine-tuning commercial LLMs like GPT-3.5 is pretty easy and cheap, but there are restrictions on model sharing, control of the algorithm, and cost are a bit annoying. An alternative is to fine-tune open-weight models like Llama-3-8B, etc. It is relatively easy and cheap to do it on the cloud. Some notes on the ecosystem for fine-tuning your own LLMs…with an emphasis on doing this remotely for the GPU poor…
-
The Persistence of Nachos
Premise: Use cookie cutters to cut watch-shapes out of American cheese slices, then melt them on nachos, in the style of Dali’s The Persistence of Memory…
-
Electrical Engineering for Music
Prof. Aaron Lanternman (GATech) has some wonderful youtube videos for his electrical engineering courses on Guitar Amplification and Effects and Analog Circuits for Music Synthesis…
-
Alan Saret
Saw an epic Alan Saret show at Karma today. I really dig his geometric works. Skip down to “Ring 1” to see the works in the exhibition (the calligraphy is alright, but I wasn’t into any of the figurative drawings). Octaspiral and Special Relationship With Joyful Incircles are some of my favorites…
-
Eliciting priors from LLMs by Gibbs Sampling
In a recent pre-print, Zhu and Griffith describe a method for Eliciting the Priors of Large Language Models using Iterated In-Context Learning. The core idea is to adapt a procedure used to perform Markov Chain Monte Carlo with People. Effectively, this implements a Gibbs sampler for the joint distribution
p(d, h) = p(d|h)p(h)
, and the stationary distribution on hypotheses is the priorp(h)
. Samples from the prior are obtained by running the iterated learning process long enough to converge to the distribution. Let’s replicate their result and compare to direct sampling… -
Imaginary Syllabi: Glazed and Confused
Premise: An interdisciplinary course in ceramic glazes as art/chemistry…
-
Great Ideas from Germany
Project ideas from my recent trip to Berlin/Heidelberg…
-
Imaginary Syllabi: Major in Techno
Premise: An interdisciplinary four-year undergraduate degree in electronic dance music practice and culture. A US undergraduate major is typically 10-12 courses, so degree requirements include…
-
Analog Computers
I periodically remember how much I enjoy analog computing as an idea. Time to start a post…
-
Great Ideas from Portugal
Project ideas from my recent trip to Portugal, to speak at the University of Aveiro CICECO…
-
Great Ideas from Korea, part 2
A continuation of ideas from last time…
-
Victor Vasarely
I saw an epic retrospective on Vasarely in Seoul last month. A few Stack-exchange threads on recreating his art in Mathematica…
-
Musical chess
Could you use a chess game as a music sequencer? I have in mind live generative pieces generated while you play, perhaps using a smart chess board as the interface. Some experiments at how this could sound…
-
Generating molecule descriptions with GPT-4-vision
I have had a few conversations in the past week about how one might build RAG-for-molecules to chat with molecular datasets. An idea that I find appealing is to have a text representation of the molecule, motivated by a paper in which Robocrystallographer was used to generate a text description of solids as an input for LLM use. If you could generate text descriptions of molecules, perhaps this would serve as an alternative to SMILES as inputs to LLMs, which might help handle the problems that LLM-based molecule property regressors have with handling 3d structure. There is a Familienähnlichkeit with the problem of molecule captioning, but that is typically presented in terms of properties, whereas here we want just a structural description. Here we try to see what gpt-4-vision can do for generating text descriptions from molecular images…
-
Electric tongue
AlphaMOS sells a FET-based electric tongue, which you can use for wine-quality determination (advance prediction of spoilage) What can it do?…
-
Tuning, Timbre, Spectrum, Scale
Awesome book exploring the relationship between timbre and musical consonance/disonance. The core idea is that dissonance is not about “scales”, but more about timbreal relationships. All sound examples are online. The author, Prof. William Sethares, is an electrical engineer, who has also written about rhythym. Some notes…
-
Marvin Minsky On LLMs
2007 Discover Magazine interview with Marvin Minsky “[Reasoning by analogy] is important because the way people solve problems is first by having an enormous amount of commonsense knowledge, like maybe 50 million little anecdotes or entries, and then having some unknown system for finding among those 50 million old stories the 5 or 10 that seem most relevant to the situation. This is reasoning by analogy.”
-
Herbert Franke
Digging the artwork of Herbert Franke, who started making analog electronic art in the 1950s then migrated to video and digital. I learned about him from a post on stack exchange
-
Cura personalis versus Cura individualis
Cura personalis (often translated as “care for the entire person”) is a catchphrase of Ignatian spirituality and Jesuit higher-education. (Although, as Geger notes, not actually used by Ignatius or his companions, but originated in the 20th century) After reading a recent blog post by Edward Feser on Jacque Maritain’s distinction between persons and individuals in Thomistic philosophy, I got to thinking: This is the right phrase, but maybe it is often mistaken in practice for cura individualis…
-
Doron Zeilberger: An Appreciation
Jay Hineman brought to my attention the mathematician Doron Zeilberger. Zeilberger is an ultrafinitist working in experimental mathematics. He has some interesting opinions….among which, making me want to read more about Sylvester…and the pernicious greek tradition of mathematics12
-
Automated Bibliometrics with GoogleScholar and OpenAlex
GregH asks: “How can I save all citations from a Google Scholar search? For instance, in a search for “Radon transformation”, there are about 35,500 results. I want to download all citations into a string - is there a simple way to get Mathematica to do that? (And sorry, no, I haven’t tried anything yet. Not sure where to start.)”. A demonstration of automating Google Scholar…and a better way using OpenAlex…
-
State Space Models
So, you want to learn about state space models like Mamba as an alternative to Transformers? A few resources:
- Structured State Spaces: Combining Continuous-Time, Recurrent, and Convolutional Models – explanation and math
- The Annotated S4 — Build your own SSM in Jax
-
Gold leaf for 3d-prints?
Can you apply gold-leaf to 3d-prints? Yes…
-
3d-printed RC Airplanes
Was turned on to the idea of 3d-printed model airplanes a year back. Periodically get reminded and still think it is cool…
-
Fluid Analogies with LLMs
Richard Halpern’s Leibnizing: A Philosopher in Motion (2023) in Chapter 22 discusses Douglas Hofstadter’s “Leibnizian” modes of though. Some notes on fluid concepts and how they might relate to LLM use…
-
Ask ChatGPT: Can you read this?
-
How to make a Monad
“Monad” is probably the best-known entry in the Leibnizian lexicon. But what is a monad? A monad is a mind, but mind as understood in a particular way. Or rather, it is mind a understood in a multiplicity of distinct though related ways. In the spirit of Leibnizian tinkering, we shall learn to grasp the concept of the monad through the practical process of trying to build one. What follows, then, is a set of assembly instructions. Fortunately, monads can be constructed with materials easily found around the home. In fact, all you need is pencil, paper, and infinite, godlike intelligence. Let’s get started! — Richard Halpern, Leibnizing: A Philosopher in Motion (2023), p.88
-
Semisupervised Twin Regression
Pairwise difference regression (aka twin regression) is an underappreciated meta-ML approach. The idea is to to take pairs of inputs
(x1, x2)
and train your model to predicty1-y2
. One advantage is that this gives youN^2
training points (handy for those small-data science problems). At inference time, you select a sample of known referencex->y
data, and generate an estimate of the distribution of pairwise distances from those references for the point of interest. Thus, the second advantage is that you get an approximate form of uncertainty quantification without having to train multiple models. Wetzel et al described this using neural networks (which they denote as twin neural networks) in 10.1002/ail2.78. Tynes et al describe a similar strategy using tree methods in 10.1021/acs.jcim.1c00670. But going beyond this, Wetzel et al. also describe a semisupervised version which trains on triples (doi:10.1088/2632-2153/ac9885) –the supervised examples have the typical MSE loss and the unspervised examples are evaluated for their internal consistency of the predictions (they should sum to zero). This gives you more data to train on and effectively regularizes the network for transductive (try to predict the labels of the unsupervised examples seen in training) or inductive (predict labels of examples that have not been seen at all), giving better performance. A minimal working implementation and demonstration of the idea… -
Differentiable Moog Model (DiffMoog)
Combining two of the interests of this blog—analog synthesizers and machine learning—Uzrad et al DiffMoog: a Differentiable Modular Synthesizer for Sound Matching is a differentiable model of all the parts in a typical analog synthesizer. Because it is differentiable, it can be incorporated into a neural network allowing for automatic sound matching/replicating an audio input. Code on Github. Via Christian Steinmetz
-
DIY DLS towards Organic Nanoparticle SDL (notes)
Recently, Young et al. Nano Lett 2024 described the use of Vittorio Saggiomo’s 3d-printed+ Ender Flow pumps to make organic nanoparticles (liposome, polymer nanoparticle, and solid lipid nanoparticle). But how do you characterize them? Let’s build a DIY dynamical light scattering device…
-
GPT-4 does acid (base chemistry exam questions)
A recent paper (Clark et al. “Comparing the Performance of College Chemistry Students with ChatGPT for Calculations Involving Acids and Bases”, J.Chem. Educ. 2023, 100, 3934-3944 ) evaluated 10 acid base equilibrium questions on students and using ChatGPT 3.5, performing both qualitative assessments of the methodology and quantitative assessment of whether the correct answer was reached. ChatGPT questions were evaluated 20 times to see the distribution of answers. Simple questions about strong acids had high success rates, but some questions, like those about salts, did poorly. For example, the question “Calculate the pH of 0.25 M NH4Cl. Kb for NH3 = 1.8 *10^-5.” returned the correct answer only 10% of the time. Another challenging question involved titrations. This raises two natural questions: (i) Would they get better results by using GPT-4? (ii) What about using modern prompting strategies (COT, ReAct) and tools?
-
Implementing the ReAct LLM Agent pattern the hard way and the easy way
ReAct prompting is a way to have large language models (LLM) combine reasoning traces and task-specific actions in an interleaved manner to use external programs and solve problems. In effect, the LLM acts as an agent that combines tools to solve a problem. Let’s implement it from scratch…
-
Accurate and safe LLM numerical calculations using Interpreter and LLMTool
In a previous post, we discussed the use of LLMTool, to provide a way for remote LLMs to execute code on your local Wolfram kernel. A natural place to use this is in performing numerical calculations–mathematical calculations are a well-known weakness of LLMs. We can solve this by having the LLM write the proposed calculation as a string, and then call a tool that Evaluate-s it on the local kernel. But Evaluate-ing any arbitrary string provided is a recipe for trouble–a malicious actor would be able to use this to perform arbitrary calculations, such as deleting files, etc. (This is not unique to Mathematica: we all know python eval is dangerous, and the modern best practice is to sandbox an entire python install.) Interpreter provides a safe and easy way to provide LLMs with the ability to perform only numerical calculations with Mathematica…
-
What to do with a chocolate 3d-printer?
Premise: Relatively affordable chocolate 3d-printers (see video) are now available (some assembly required). What types of business could you start with this?…
-
Holograms
I saw some work by August Muth in the Vladem Contemporary, which turned me on to holography. Resources…
-
Finally getting tags working
I finally got post tagging working in Jekyll…made more difficult by some non-standard path choices. Possible footguns to consider…
-
3d printed Triboelectric generator and applications
Saw an article with youtube video on Hackaday about a 3d-printed triboelectric generator, which describes the work in this paper Fully enclosed microbeads structured TENG arrays for omnidirectional wind energy harvesting with a portable galloping oscillator from Zhong Lin Wang’s group at GATech. Use your standard fdm printer, condutive filament, and insert some PTFE beads during a pause, so it is easy to do at home. Triboelectrics give you high voltage, but low current; you can always play some electronics tricks. In the paper they charge a 220 mF capacitor to 3 V in under 1 min, and under linear shaking at 3 Hz, the maximum peak power is 2.1 mW, the maximum average power is 1.2 mW. How might this be applied?…
-
Imaginary Cinema: Titanic, but Tiki
Premise: Remake Titanic, but as a Tiki-themed animated movie (inspired over Mai Tais at Bahi Hut, which alleges to be Florida’s oldest tiki bar)…
-
Imaginary Syllabi: Scholar's Rocks
Premise: (Inspired by a visit to the Ringling Museum’s collection…) A study of the geochemistry, cultural history, and aesthetics of Scholar’s Rocks (Gongshi) (also Korean suseok) and Japanese suiseki, with a dose of 3d-printing and sculptural practice. A framework to think about issues of generative/AI artwork, the boundary between human practice, nature, and chance in art, etc. Topics include…
-
Imaginary Cinema: Titanic, but with Manatees
Premise: Remake Titanic but with all the characters played by manatees…
-
Sound of a Cantor Set Square Wave
While reading Mandelbrot’s Fractal Geometry of Nature, I wondered about what would be the sound of a square wave defined by the intervals of each iteration of a Cantor set. One can imagine the first iteration is just a square wave (with a non-50% duty cycle), and that other iterations will potentially be waves with higher frequency. (This would define a niche genre of 1-bit music .) What will it sound like?…
-
For a Few Pixels More: A Page-Oriented Perspective on GPT-4-Vision for Scientific Data Extraction
In a previous post, we explored some of the limitations of gpt-4-vision-preview for interpreting scientific figures in papers. A recent pre-print by Zhiling Zheng et al.1 demonstrated a useful strategy: Ask questions about images of the entire page. Surprisingly this does much better because it incorporates a variety of contextual information (figure captions, surrounding text, etc.). Here we demonstrate and adapt their strategy to solvent-solvent separation data…
-
Computational Painting: After Charles Demuth
Eldo asks: How can we reproduce Charles Demuth’s “Figure 5 in Gold”? Some fiddling with Dall-E-3/GPT-4/(V)…
-
Feuerzangenbowle Zuckerhut Designs
I love to make a good Feuerzangenbowle for my annual holiday party, but in the USA, zuckerhutte are expensive. Fortunately, it is easy to make your own (1 tsp water + 1 cup of sugar…pack into a conical glass). But cone shapes are boring. If you are going to make your own, why not come up with other designs? Ideas for hipster zuckerhut designs…
-
Computational Sculpture: After Damon Hamm, part 1
Damon Hamm writes: I’m WAAAAAY into the idea of making my own custom agent that ingests my previous writings, proposals, and sculpture descriptions to write more like I would. … Let’s make something cool! Let’s investigate taking photos of one of his sculptures and evolve them by using GPT-4V(ision) to describe the sculpture and Dall-E-3 to generate a new image of a sculpture based on that description…
-
Imaginary Cinema: The Rock, but with Chiguiros
Premise: Remake The Rock, but with all the characters played by chiguiros…
-
Computational Sculpture: After Herbert Bayer, part 1
During an afternoon stroll down Canyon Road, I stopped into Peyton-Wright Gallery…mostly because of the striking outdoor sculpture, which I would learn was part of the works of Herbert Bayer (1900-1985). I enjoy the geometrical structure of his sculptures, inspiring a few investigations…
-
GPT-4-vision-preview for scientific image processing: The Bad, The Mediocre, and the Tolerable
The new GPT-4-vision-preview API has many exciting use cases. But can it help us with interpreting figures in scientific papers? tl;dr–we can sometimes verify approximate values and trends of data, but the current version does not handle quantitative data extraction from scientific figures… **UPDATE (20 Dec 2023): We get better results by taking a page-level perspective… **
-
Visiting New York
A local’s guide to visiting New York City… (commonly offered advice to short term visitors, with an emphasis on places that are a bit more unusual and off the beaten path)
-
LLMTools demonstration
Large-language models (LLMs) struggle with precise calculations needed for chemistry-related tasks. In a recent paper, Bran et al. described ChemCrow–a series of computational tools that can be used by the LLM to perform intermediate calculations. Here we show how to implement a few ChemCrow-style tools from scratch and use it to get GPT-3.5/4 to correctly compute SMILES and molecular weight information given an input chemical name…
-
Non-octave-preserving tunings
While skimming a back issue of Mathematics Magazine, I came across Jordan Schettler’s article Wendy Carlos’s Xenharmonic Keyboard which describes a continued fraction derivation of an idea by Wendy Carlo—whereas “traditional” tuning ideas are based on preserving the octave (2:1 frequency) ratio, we might alternatively just try to find equal tempered scales that give a good approximation to the perfect fifth (3:2) and major third (5:4) and forget about this giving an octave…
-
Literature about National Laboratories
Campus novels about academic life are so common. But what about novels that are set at national laboratories (in which, the lab is itself a type of character)? A few good ones I have read…
-
Computational Sculpture: After Karen Bexfield, part 2
Our last attempt at this project resulted in a kind of lumpy potato from the lofting effect. This suggested another approach: start with inherently conical structures. This results in a much more pleasing effect…
-
Imaginary Syllabi: Money And Art
Premise: An exploration of money and art through history. Topics include…
-
Dall-E-3 image generation in Mathematica
By default (as of 24 Nov 2023, Mathematica 13.3), it appears that ImageSynthesize uses Dall-E-2 for image generation, as the results are kind of trash—but with some tricks you can get it to use Dall-E-3 instead…:
-
Slip casting
I’ve never understood the fascination with pottery wheels—it seems labor intensive and imprecise. I also limits you to things that have cylindrical symmetry (although I suppose you could use a mechanical lathing mechanism like in wood turning…). Instead, I’m much more interested in slip casting…
-
Computational Sculpture: After Karen Bexfield, part 1
Inspired by a walk down Canyon Road on an October Saturday morning, I had many inspirations for sculptural works that could have a flavor of mathematics, computation, and advanced manufacturing. At Winterowd, I came across work by Karen Bexfield, whose schtick is kiln-formed glass truncated bi-cones with holes in the surface: How would you generate a family of these objects computationally, within the constraint of FDM?…
-
Cheminformatics: SMILES and InChI quirks for salts
Olivia Vanden Assem ‘25 asks: Why am I getting inconsistent SMILES,InChI, and InChI Key results for the salt and neutral acid-base representations of ammonium nitrate? There are some quirks about interconversion between SMILES and InChI in standard implementations that can result in neutral and salt forms of a pair of molecules being different… **
-
The Metallic Scent of Old Coins
Lisa Cooperman (via Anthony Dutoi) asks: Weird but fun museum type question seeking your expertise. I’m curating an exhibition with four Powell scholars who are interpreting collection work by making various installations. One scholar exploring artist Wayne Thiebaud’s interest in nostalgia is making ‘scent books.’ She wants to include the metallic smell of old coins……do you know what chemical compound or interaction produces the smell and can be safely reproduced and exhibited? Each book will have a receptacle area and a lid that closes, but not airtight. Chemistry to the rescue…
-
Parsing Molecular Identifiers From the Ideal Database, part 3
In our last episode, we were left with 25 cases where the inferred molecule did not agree with the reported formula or molar mass, in our quest to to turn the IDEaL database into a comprehensive f-element separation database. Here we fix them by hand and generate the final result…
-
Parsing Molecular Identifiers From the IDEaL Database, part 2
In our last episode, we tried screen-scraping the IDEaL database to build a f-element separation database, but encountered a problem: We could not resolve 105 of molecular structure entries (i.e., we could not parse the linked ChemDraw file or the name or run a pubchem query and get a result that matched the listed molecular formula and molecular weight information). In most cases, the problem is that alkyl sidechains are only presented implicitly as molecular formulas, rather than as explicit structures so we cannot generate a proper Molecule. After initially thinking we might have to retrieve these by hand (ughh…), I had a better idea: Can the ChemDraw application correctly parse these implicit side chain specifications? It turns out that the answer is yes, and so this suggested a solution: Build the results by scripting the download, and running automated key commands in ChemDraw, to retrieve the data…
-
Parsing molecular identifiers from the IDEaL Database
Suppose you (not-so-hypothetically) want to screen-scrape the IDEaL database in order to build your own f-element separation database. While it is straightforward to task a student with scraping the data tables, extracting machine-readable chemical structures is more challenging, as they do not provide standard chemical identifiers such as SMILES, InChI, etc. However, they do provide Chemdraw files and IUPAC-ish names which we can try to parse. This also provides an opportunity to do some LLM-based screen scraping. (Spoiler alert: We uncover approximately 105/438 entries with unresolvable structures, including 6 cases where the stated molecular formula is inconsistent with the stated molecular mass.) Mathematica to the rescue…
-
Semantic Article Retrieval from CrossRef with LLM Embeddings
Suppose you wanted to gather all of the scholarly articles related to a topic (e.g., constructing a database of f-element solvent separations)? You could do a keyword search, but that might include many useless papers or fail to include papers with the exact specified keywords in the title or abstract. A modern approach would be to try to find articles that are semantically close by generating an embedding vector that represented the document (using your favorite foundational language model) and then perform some sort of clustering or classification to retrieve documents of interest–this might be as simple as just finding documents where the vectors are close enough (below some threshold). A code sketch demonstration of how to conduct this process (and some of the complications that may arise)…
-
Ask ChatGPT: What should I name my f-element separation database?
HEY GPT-4: Propose 20 catchy acronyms for f-element, lanthanide, actinide solvent separation database project…
-
(Mostly) f-Element Separation Databases I Have Known And Loved (fESDIHKAL)
(with apologies to the Shulgins) A collection of references to (mostly) f-element hydrometallurgy / solvent extraction / liquid-liquid extraction databases (lanthanides, actinides, plus Sc and Y so we can do rare earth separation)…
-
Famous scientists with poor intro chemistry grades
To encourage the youth: A list of famous scientists who are “on the record” about receiving poor grades in general chemistry or organic chemistry…*
-
Protein Language Models
A chat with Rudi Gunawan turned me on to the idea of using language models for peptide problems, which came up again in a chat with Yujia Xu about designer collagen. Some notes on protein language models…
-
Great Ideas from Italy
Business and art ideas inspired by my recent trip to northern Italy…
-
Comparing student success rates with the Beta distribution
Sarah Maurer asks: The DEI community uses an disparity index (which is the ratio of outcomes of a a group of interest relative to the outcomes in the total population of students) as a way to measure disparate outcomes for students (for example, the DFW rate ). But this seems like it does not handle errors associated with small numbers of events. And what is a meaningful effect? Is there a better way to do this? The Beta-distribution to the rescue…
-
Imaginary Syllabi: Wine Across the Disciplines
Premise: A senior capstone course1 in which wine is explored through the lens of a variety of natural science, social science, and humanity disciplines (along with a weekly wine tasting) Topics include…
-
Anthony Dutoi suggested that there is so much material here that it could be an entire degree program. Maybe it works best as an adult continuing education program… ↩
-
-
Imaginary Syllabi: AI for Liberal Art Students
A collection of reading lists for introducing AI concepts and practice to a general population of starting undergraduates…
-
Three LLM Summarization Strategies
User5601 asks:Is there any append/chunking solutions for getting OpenAI to summarize docs longer than the input limit through either one large or multiple ChatInput cells or some other programmatic way (using the new LLM functionality in Mathematica 13.3)? A summary of strategies and implementations…
-
Imaginary Cinema: In the Mood for Love, but with Chiguiros
Premise: Remake In the Mood for Love, but with all the characters played by chiguiros…
-
Retrieval Augmented Generation
User5601 asks: How can I implement retrieval augmented generation (RAG) using the new LLM functionality in Mathematica 13.3? There are many desirable reasons for using RAG: It allows you to provide your own domain-specific information and to make citations to sources. Let’s implement a RAG that can answer questions using the Catechism of the Catholic Church…
-
Imaginary Syllabi: ChatGPT and Addressative Magic
Premise: An introduction to prompt engineering by exploring analogies to magic (defined as “ritual meant to effect changes in the world”)…
-
Modest Proposal: Post Affirmative Action College Admissions
Following the recent US Supreme Court ruling on Affirmative Action, universities are scrambling to (re)define their admissions policies. Justice Thomas’s criticism of Affirmative Action is that it creates uncertainty detrimental to high-performing minority graduates. A July 2023 NY Times article desribes this as: “At Yale, [Thomas] was one of only 12 Black students in his law school class, admitted the year the law school introduced an affirmative action plan. His white classmates viewed him as a token, he felt — a belief in the corrosive effects of affirmative action that was only deepened by his failure to win the law firm job he had dreamed of. “I’d graduated from one of America’s top law schools, but racial preference had robbed my achievement of its true value,” [Thomas] later wrote.” A neoliberal approach would let the market sort things out: Universities admit candidates as they wish, but must archive and disclose a nutrition facts-style label to students and employers about how they made the decision. Here’s a modest proposal of how it could be implemented…
-
Imaginary Syllabi: Athanasius Kircher and Generative AI
Premise: Consider a general intelligence capable of producing text and images on demand about topics as diverse as chemistry, mathematics, music making, ancient and foreign languages, biblical hermaneutics, the invention of gadgets and scientific instruments, and descriptions of far-away lands.1 The output is voluminous, verbose, and presents arguments and sources about things that are true…and also about topics that are mythical or completely fabricated. Are we talking about ChatGPT or the 17th century Jesuit polymath Athansius Kircher? How might a study of Kircher inform our approach to generative AI? A syllabus…
-
Any resemblance to the author of this blog is purely coincidental. ↩
-
-
Great Ideas from Korea
Business and art ideas inspired by my recent trip to Korea…
-
Reviving the Hammond XB-2
I’ve got a Hammond XB-2 that I want to pull out of storage. But from what I read, failures are common, as the caps age out, ROM chips fail, etc. Keyboard Partners sells a HX3 retrofit kit for about $700 that is essentially a complete gut of the internal electronics, just keeping the case and input/outputs. Upgrade is a DIY project
-
FM synthesis
- John Chowning’s Stria (1977) famous FM synthesis composition (from the inventor of the technique), and also uses a golden mean tuning
- FM synthesis in SC
- Freaq FM Digital Synthesizer – 8-bit digital synthesizer for your desktop featuring dual 2-op FM voice architecture with multiple waveforms, LFO and modulation envelopes with a 16-step generative sequencer. Also sold by Thonk
- 6 channel DX-9 style FM synthesis on the Raspberry pi pico…just add I2S DAC…in C
- Web-browser based FM synthesis (via Hacker News)
-
Logarithmic music scale
Screw Pythagoras! Interesting modwiggler thread on non-pythagorean tunings. One that leaps out is Robert Schneider’s logarithmic tuning has some interesting sounding beat frequencies; he made an album using the logarithmic tuning which has some interesting sounds. There are also interesting properties in “macrotonal” scales, such as the 7EDO used in some African and early Chinese musics and 9EDO scales used in Indonesian music.
-
Resisting the Subharmonicon
The Moog Subharmonicon … who can resist an analog synth based on the subharmonic generative music theory of Schillinger. See it in action, with tutorial. Polyrhythms and subharmonics, oh my. A trip to Perfect Circuit this weekend? Or save $600 and do it in software…
-
Euclidean Rhythm
Euclidean rhythms are a way to space n onset events across m positions (essentially, pulses or beats) as evenly possible. Ffor example, 4 onsets across 16 positions, will result in 4 evenly spaced onsets. However, if the number of onsets is relatively prime with respect to the number of pulses, the resulting pattern is more interesting. This was discovered somewhat recently by Godfried T. Toussaint. A nice interactive javascript example is online with 4 samples. Play with it and you’ll hear some interesting ideas, especially if you choose relative primes. But how do you implement it…
The best popular explanation I have found online (with some visualizations) is a medium post by Jeff Holtzkener which introduced me to its implementation in terms of the Bresenham line algorithm for pixelating lines (plus a floor operation). This can be done simply in Mathematica as:
-
Oscilloscope music
Michael Taylor notes: There is a whole community of people who try to make music (or at least a company called Oscilloscope Movie.com that sells software that converts blender models to sounds that can be displayed on an analog oscilloscope. Video overview Essentially its just the stereo input. X and Y map to left and right speaker signal, so you’re almost free to set pitches as you wish (within psychoacoustic limits if listening on headphones). Underneath is an efficient path problem.
-
Autodidact guide to electronic music making on digital computers
In the beginning, the LORD created modular synthesizers, and they were good. He sent his Prophets, Bob and Don and Serge, who told all the peoples “Make a joyful noise unto the LORD all the earth, but thou shalt not eat of the fruit of the tree of the digital computer.” But the people hardened their hearts and did not listen. So the LORD sent his prophet Dieter, a voice crying in the wilderness, who said “Repent, repent! For the Eurorack is at hand.” But the people said: “We have no King but Gordon.” And so it came to pass that digital computers became numerous upon the face of the earth. Then angel of the LORD came down and created a multitude of programs to confound their language, so that they may not understand one another’s code. “Now are you happy?” said the angel of the LORD. And the people said: “Do do do, do do do do, do do do do do…”. A list of resources…
-
Ragas Against the Machine
Premise: Electronic rendition of ragas (classical Indian music). Because: (i) interesting/programmable polyrythmic patterns (taal); (ii) ragas are something like a combination of mode and melody and framework for improvisation, which might be well suited for randomization/quantizing;(iii) drone sounds and glissandos of the sitar and tone varied thumping of tablas are well suited to analog synthesizers or programming; (iv) Not totally heretical, as Raga rock/pop exists; (v) amusing pun potential. Resources include…
-
Bytebeat (music)
Bytebeat is a genre of electronic music defined by short C programs of the form:
main() { for (;; t++) putchar(EXPRESSION); }
where
EXPRESSION
consists of standard arithmetic and bitwise operators acting on integers; no function calls are allowed and the only variable is the time countert
which is never modified within the expression. While the expression is typically evaluated with 32 or more bits of integer accuracy, only the eight lowest bits of each result show up in the output, and this low byte is interpreted as unsigned 8-bit PCM sample values with the rate of 8000 samples per second. Some examples, history and notes, and Mathematica implementations… -
Sine wave speech
Sine wave speech is a research technique in which a human voice is reduced to 3 or 4 time-varying sinusoids. It is unintelligble when heard without prompting, but with prompting it is clearly distinguishable. A nice explanation (with examples) can be found on Matt Davis (Cambridge), and Robert Remez (Barnard) has a very artistic example setting a Robert Frost poem to sine wave, with experiments on note duration and pitch quantization. Art project: You could take Johnny Cash reading the Bible and turn this into a four voice analog oscillator…
-
Birth of the Cool
“The Italian Renaissance author and diplomat Baldassare Castiglione was an early champion of cool, or what he thought of as studied carelessness, calling it ‘sprezzatura’. And his Book of the Courtier (1528) delineates in exquisite detail how to ‘practise in all things a certain nonchalance which conceals all artistry and makes whatever one says or does seem uncontrived and effortless’. As Castiglione understood, we value cool because we assume that those who act effortlessly must have plenty of resources in reserve for when bigger challenges arise. They exhibit a superfluity of fitness. Creating this appearance of ease is so valuable, Castiglione thought, that one should devote considerable effort to making it happen.” source — Some relate sprezzatura to the japanese concept of shibui — deliberate Miles Davis ref
-
BachScratcher and ModularMonk
Premise: a Bach chorale sequencer wtih CV output or alternatively a Gregorian chant sequencer…
-
DIY drum machine ideas
My analog synthesizer lacks a drum machine. I suppose you could just Moog DFAM or (clone), but… A few thoughts towards a project, assuming you are not an analog purist and have a Raspberry Pi Pico floating around…
-
1-bit music
What is 1-bit music?: “produced by repeatedly switching the current that goes to the built-in speaker on and off, or in other words, they are produced by toggling a signal between two states”. Depending on how you look at it, this is the most digital (it’s only ones and zeros with no pretense of waveforms) or least digital (it’s only 1-bit of digital precision) you can get. The challenge (as stated in Victor Adan’s phd thesis) is that the 1-bit music composer is “directly confronted with the problem of creating pitches, timbres and polyphony (parallel perceptual streams), all from a single train of pulses. With only two symbols (0 and 1) there is no ‘vertical’ information, no subtlety in the degree of push and pull. Thus, in the confines of a 1-bit music, one is forced to use time to convey all information; time is the only carrier of information. i.e., everything must be created from rhythm.” Examples and technical guidance…
-
Minimal diffusion model from scratch for generative machine learning
Inspired by François Flueret’s post on a toy diffusion model, I decided to also explore the Ho et al. 2020 “Denoising Diffusion Probabilistic Models” paper and try to write this up as a tutorial, based on Prof. Flueret’s code. (Recently there was a post on the mathematica stack exchange which used a MNIST digit generation problem using a complicated UNet Model–for pedagogical purposes I will stick with simple 1D and 2D probability distributions to illustrate the core ideas without complicating the code.) Lilian Weng has a nice tutorial explanation of the mathematics of diffusion models that we can use as a resource, and Hugging Face has a annotated implementation tutorial in PyTorch. Diffusion models (like other generative models) convert noise into a data sample. The setup consists of two processes: (i) a fixed forward diffusion process q that gradually adds Gaussian noise to an image until it becomes pure noise; (ii) a learned reverse diffusion process p_\theta where a neural network is trained to gradually denoise an image starting from pure noise. This process is conducted over t steps. Let’s implement it!…
-
Flask Learning Resources
I may have to bite the bullet and develop a CRUD application. Flask doesn’t seem so bad. Some reasonable resources like:
- Flask official documentation
- Miguel Grinberg’s Megatutorial website also available as a printed book. He wrote a 2018 O’Reilly book, but his self-published Megatutorial book is updated more frequently, so it seems like it will be better
- (Microdot seems like a cool little flasky-influenced framework, esp. on a smaller microcontroller like the Pico or ESP32)
-
A Few of My Favorite Transformer Tutorials
So, you want to learn about transformer models? Here are some of my favorite learning resources to get started:
- Peter Bloem Transformers from Scratch: I like how he builds up attention from simple vector dot products (without invoking the query/key/value notation until late in the game after he has given you a firm mathematical understanding of what is going on and why it makes sense). Code examples in pytorch, but easily understandable without it
- Brandon Roehrer Transformers from Scratch: I like how he motivates the idea of Markov models for text generation, and then motivates the idea of attention in terms of having variable lookback (without too many parameters). Also a lucid explanation of positional encoding tricks
- Jay Alammar The Illustrated Transformer: This seems to be perenially popular on hackernews…I guess if you are into pictures and youtube videos it would be your thing. Included here for completeness, but I prefer the two above
- My favorite My Favorite Things is Coltrane, naturally, but that’s a story for another post.
- Addendum: 15 Mar 2023: FastGPT–GPT-2 in 300 lines of Fortran. ‘Nuff said.
- Addendum: 15 Jan 2024: Random Transformer Nicely worked simple example, multiply the matrices yourself for concreteness.
- Addendum: 24 Feb 2024: Implement GPT-2 in an Excel Spreadsheet. This is just the thing you need to do to build your skills for the Excel World Championships
- 14 Mar 2024 Built a simple GPT-2 type model in Mathematica, on the Shakespeare corpus step by step instructions, with custom tokenization. can be trained on a laptop in about an hour.
- 20 April 2024 3blue1brown has a nice video series on transformers as part of a series on deep learning
- 16 Jun 2024: Recent review of Review: Application of Transformers in Cheminformatics JCIM 06/2024
-
Book Review: The (Mis)behavior of Markets
Some notes on Benoit Mandelbrot & Richard Hudson, The (Mis)behavior of Markets: A Fractal View of Financial Turbulence (2004). Premise: The (geometric) Brownian motion model of asset prices, and the house built upon its Gaussian foundation (i.e., Bachelier, Markowitz efficient portfolios, Sharpe’s Capital Asset Pricing Model, Black-Scholes) fails to capture the extreme variations present in real-data. Power law distributions, fractional Brownian motion (which has postive or negative time-correlations) and multi-fractal regime changes better capture observed variations…
-
Making an Electric Guitar
Joseph Tardio asks: I want to make a custom electric guitar… I’m always in for a Fusion360 and CNC project…
-
Imaginary Syllabi: War and Logistics
Inspired by a discussion with Zachary Jones ‘25: An interdisciplinary course on the philosophy and history of logistics and war (as distinct from economic optimization and war proposed earlier).Students will focus on the philosophy/criticism of Paul Virilio, historical examples of logistics, and elementary aspects of mathematical modeling of logistics, with a special focus on linear programming and Monte Carlo simulations (performing relevant calculations in Mathematica). Reading lists to include…
-
Imaginary Cinema: Titanic, but with Chiguiros
Premise: Remake Titanic, but with all of the characters played by chiguiros….
-
Imaginary Books: Schopenhauer für Kinder
Premise: A children’s picture book about the philosophy of Arthur Schopenhauer, loosely organized on Die Welt als Wille und Vorstellung. The trick is to have a proper mix of depressing text/worldview and cheerful pictures. I think this will play better in the German market. Each text line has a full-page figure (italicized text description). Some quotations from Schopy to be typset in Fraktur as part of graphic. Outline and draft of text and images… Figure courtesy of Crayonfou
-
Imaginary Books: Famous Hungarians from A to Z
Premise: A children’s picture book with short biographies of famous Hungarians, with an emphasis on mathematics, science, engineering (but with occasional pop-culture figures to liven things up). Inspired by discussions with Chris Görog about the Erziehung of his children. Entries to include…
-
Predicting Rare Earth Element Separation Chemistry
For a new project on f-element (rare earths and actinide) separation, I am trying to wrap my head around the literature. There is a very nice recent result from De-en Jiang and co. where they created a dataset of 1600 lanthanide rare-earth separation experiments, featurizing the extractant molecules with fingerprints and RDKit style features, featuring the solvents with some properties, and featurizing the metals themselves with a few periodic properties. (Liu et al, “Advancing Rare Earth Separations” JACS Au 2022 https://doi.org/10.1021/jacsau.2c00122 ). The supporting information contains the compete dataset (yay!) but not the neural network that they trained (although it describes it in the text). Let’s reproduce their result by training our own neural network…
-
Imaginary Cinema: The Sting, but with Chiguiros
Premise: Remake The Sting, but with all of the characters played by chiguiros….
-
Imaginary Cinema: Doctor Zhivago, but with Chiguiros
Premise: Remake Doctor Zhivago, but with all of the characters played by chiguiros….
-
Imaginary Cinema: Ben Hur 1980
Premise: Remake Ben-Hur but set in 1980s New York City…
-
Probabilities of Mineral Formation
I saw a fascinating talk by Daniel Hummer on “Data Mining the Past: Using Large Mineral Datasets to Trace Earth’s Geochemical History” A few gleanings below…
-
MQTT and ROS2 integration
My previous failure to successfully compile micro-ros-agent illustrates some of the challenges of setting up ROS2 in constrained environments. First, ROS2 is really complicated to install and restricts us to running on Ubuntu, which might make it hard to do fancy hardware stuff on the Raspberry Pi host that would be easier in Raspbian. Second, if we want to use micro-ros on our microcontroller (e.g., pico) , we’re back to programming in C/C++, which increases the barrier for students. Third, you need more than 1GB of RAM on the Pi 3B+ to compile the micro-ros-agent that serves as a bridge between these functionalities. How can we keep the advantages of the ROS Publish/Subscribe model, but simplify our microcontroller development and interfacing? In this post, we’ll look at using MQTT as a lightweight publish/subscribe service and how it can be interfaced with ROS2…
-
Properties of Random Peptides
A seminar talk by David Eliezer got me thinking about the properties of random peptides. Do they fold? If so, into what? Are they instrinsically disordered? As it turns out, there is some literature on this…
-
Embedding PDFs in Jekyll Pages
Jonathan Kinlay’s blog had some slick embedded PDFs of Mathematica notebooks. Looking at the source code (he uses wordpress), they are just PDFs with a paginator. Apparently you can do embed PDFs in Jekyll pages too, so I might try that sometime. Note to self.
-
Reading AS726X and AS7265X Spectral Sensors in Micropython
The AS726X and AS7265X spectral sensors give you the ability to read visible, UV, and IR over an I2C bus, and are relatively cheap ($27-$60 USD). The former focuses on just visible or IR, and the latter is a triad sensor that spans the entire spectrum. In this post, we’ll walk you through setting this up and reading the sensor values…
-
Three mathematical models for machine learning and high-throughput experimentation
Since the seminal 2018 Mexico City report, there has been increased interest in developing autonomous research systems–aka, self-driving laboratories–that combine machine learning (ML) with robotic experimentation. (For recent perspectives on this, see Stach et al. 2021 and Yano et al. 2022.). At least historically, automated experimentation implied high-throughput experimentation (HTE) with the goal of achieving large data scale. An alternative are continuous systems, where experiments are performed one at a time–these seem like they are optimal for incorporating learning into the process. How does HTE enable ML? Or vice versa? The goal is to get some intuitions that can be used to inform a provocative perspective article on ML for exceptional materials. It is also inspired by some ideas from a blog post on High-Variance Management (and corresponding Julia notebook), with examples from show business.. In this post, we explore a few simple mathematical models for this interplay and limits of ML and HTE…
-
Imaginary Syllabi: War and Peace and Optimization
A course for non-math majors on optimization and its role in hot- and cold-wars. Students will learn elementary aspects of mathematical optimization, with a special focus on linear programming, as viewed through the historical developments of WWII and the Cold War, and do some relevant calculations in Mathematica. Reading lists to include…
-
Imaginary Syllabi: Games of Skill and Chance
An interdisciplinary capstone course on luck, skill, and the games of everyday life, combining mathematics, literature, and philosophy. Practical activities to include playing blackjack, chess, and writing Monte Carlo simulations. Reading lists to include…
-
Building a MIDI to CV converter with the RP2040 (part 2)
In a previous post, we defined some of the ideas and resources for the system. Along the way, we’ll learn a bit about I2C programming on the Pico and MCP4728 DAC. In this post, we’ll actually build the MIDI to CV gadget…
-
Building the mki x es.EDU analog synthesizer
Seeing a Suzanne Ciani performance at Ambient.Church inspired me to get into analog synthesizers. To learn more about the electronics aspects, I built the mki x es.EDU analog synthesizer kit Some notes, recommendations, and reflections on the process…
-
ROS Setup for Raspberry Pi and Pico
Last time, we collected some resources about the process. In this post we’ll take you step-by-step on installing ROS2 on a Raspberry Pi 3B+, configuring the ROS and MicroROS development environment, and getting to ‘hello world’… STATUS: Successfully installed ROS2, but configuring the micro-ros-agent on Raspberry Pi 3B+ has some problems.
-
Autodidact guide to physical chemistry
Ruslan Khafizov asks: I’ve come across your book and I’m thinking of attempting to self-study physical chemistry “as if computers exist”. What would you recommend for a first study of physical chemistry? Here is my recommendation…tl;dr McQuarrie and Simon’s Physical Chemistry: A Molecular Approach is a solid choice…
-
Online Chaos Learning Resources
Learning resources about chaos, especially chaotic electrical circuits and synthesizers…
-
MicroROS on the Rapsberry Pi Pico
Goal: Attach small actuators and sensors to a Rasberry Pi Pico (RP2040) and have it attach to present to a ROS2 network running on some other device. Build this into a larger project. This Post: Some collected notes on resources. Status: See how-to-guide post
-
Reading ThermoFischer XCalibur RAW Files
Sarah Maurer asks: I’m using a GC/MS to analyze some samples, but the ThermoFischer XCalibur software on the instrument only allows us to export the data one CSV at a time, which is an error prone process. Is there any way to automate this? Here are some notes on how to do this in 2023…
-
Building a MIDI to CV converter with the RP2040 (part 1)
Goal: I have a few MIDI keyboards mouldering in the house and a newly built (by me) Erica/MKI modular synthesizer, but no way to get them to talk to one another. You could buy a gadget to do it, but where’s the fun in that? So let’s build a device which will generate output control voltages. The premise is to read in MIDI signals, use a RP2040 microcontroller to interpret the commands, and then use a D2A converter to output the relevant control voltages. Discussion of precedent, parts, and design goals…
-
Autodidact guide to advanced manufacturing
Interested in teaching yourself about advanced manufacturing, but don’t want to commit to a full-time apprenticeship? Here is my autodidact’s guide to learning some basic skills at home in your spare time…
-
Publication quality figures in Mathematica
Some things that I have found useful in generating figures for scientific publications (single column width, inset labelling, etc.)…
-
Object-oriented programming in Mathematica
At its heart, Mathematica is a term replacement language, and includes primitives for both functional programming and procedural programming. But what about object oriented programming (OOP)? An interesting blog post by Hirokazu Kobayashi showed me how to get Mathematica down with OOP…
-
Do carbon-carbon quadruple bonds exist?
Ethan Saunders ‘26 asks: “We learned about carbon-carbon single, double, and triple bonds…but do carbon-carbon quadruple bonds exist?” The short answer is: “maybe”….
-
Linear discriminant analysis
Bob LeSuer (aka BobTheChemist) asked: *“Because it’s Friday and I don’t want to grade, I’m doing multivariate analysis with #Mathematica (yes, grading ranks that low). PCA is working nicely. @JoshuaSchrier do you have strategies for doing LDA with cross-validation?” This set me off on a Saturday morning project to learn Linear Discriminant Analysis (not the local density approximation, although you can read about implementing that in my book). The best article I found was Sebastian Raschka’s “Linear Discriminant Analysis–Bit by Bit” which describes the basic idea, and then explains the LDA process in 5-steps, using the Fisher Iris dataset and an implementation in python. (Gabriel Peyre has a nice animation, though.) LDA is similar to Principle Components Analysis (PCA), in that the goal is to project a dataset onto a lower-dimensional space, but with an additional goal of finding axes that maximize the separation between different classes of outcomes. After reducing your data in this way, you can then use your favorite machine classifier method. Let’s see how to implement LDA in Mathematica…
-
Parrondo and Portfolio Diversification
Michael Stutzer, a professor of finance at University of Colorado, Boulder has written about how portfolio diversification can be thought of as a Parrondo-like problem. Essentially, one can think about rebalancing a portfolio to include some assets that have negative expected real returns (for example, treasury bonds) as being analogous to the Parrondo paradox, and he constructs a simple binomial model to explain this in a pedagogically simple way. The results will not be surprising to any Boglehead…
-
Parrondo's Paradox
Parrondo’s Paradox (discovered in 1996) is a curious scenario in which by alternating between two losing (negative expected value) games a player can win (achieve positive expected value). Paul Nahin in Digital Dice describes the following variant: Consider games in which flipping heads results in winning $1, and tails results in losing $1. Game A has you flip a biased coin that returns heads with probability 1/2-epsilon. Clearly this is a losing game (negative expected value) for epsilon >0. Game B has you flip a coin that depends on your current wealth, M. If M is divisible by 3 then you flip a coin that shows heads with probability 1/10-epsilon; otherwise you flip a coin that shows heads with probability 3/4-epsilon. This too is a losing game (negative expected value). The “paradox” is that by alternating between these games, you can achieve a positive expected value over time…
-
Knuth's Toilet Paper Problem
One of my innocent pleasures is reading probability puzzle books, a habit I started with Paul Nahin’s Digital Dice. Chapter 8 in Nahin’s book presents Donald Knuth’s toilet paper problem (American Mathematical Monthly1984). Knuth’s paper begins as follows: “The toilet paper dispensers in a certain building are designed to hold two rolls of tissues, and a person can use either roll. There are two kinds of people who use the rest rooms in the building: big-choosers and little-choosers. A big-chooser always takes a piece of toilet paper from the roll that is currently larger, and a little-chooser always does the opposite. However, when the two rolls are the same size, or only one roll is nonempty, everybody chooses the nearest nonempty roll. When both rolls are empty, everybody has a problem. Let us assume that people enter the toilet stalls independently at random, with probability p that they are big choosers … If the janitor supplies a particular stall with two fresh rolls of toilet paper, both of length n, let Mn(p) be the number of portions left on one roll when the other roll first empties.” This problem can be solved by introducing a recurrence relationship, with some fun use of memoization for efficiency…
-
Conformal prediction example
Conformal prediction computes confidence intervals associated to any black box prediction method, without assuming any prior model on the sample in the dataset. More formally, conformal prediction bounds the miscoverage: P(Y notin set)<=α by computing the interval as quantile of runs of the method over the points in the dataset. I was inspired to think through this Gabriel Peyré’s post, and the notes below closely follow his notebook…
-
Maximum entropy coin toss, revisited
Suppose that you roll a die many times and learn that the average value is 5–what is the most likely (maximum entropy) distribution of the probabilities? Last week we solved this by brute force…today we’ll solve it by considering the Gibbs Distribution (aka Boltzmann’s Law)…
-
Jai Alai Monte Carlo
I recently read Steven Skiena’s Calculated Bets: Computers, Gambling, and Modeling to Win which describes Jai Alai and his efforts to develop an effective betting system. I enjoyed reading his Algorithm Design Manual, and this one doesn’t disappoint–Calculated Bets is a thoroughly enjoyable book, with a wry sense of humor. Prof. Skiena has a summary slide-deck on his website. Alas, betting on Jai Alai probably isn’t viable any more–according to Wikipedia, as of July 2022 there is only one active professional fronton in the USA. But there’s an interesting probability problem that can be solved by simple Monte Carlo simulations…
-
What's the value of an option with a long time to expiry?
Chris Görög asks: “I knew there was a bug in the options pricing code I downloaded. I can’t quite figure out exactly where they f–ed it up but they swapped the spot and strike prices. At infinite time or volatility the call option should be the same as the current stock price NOT the strike price. It just didn’t make sense to get an option value greater than the current price since you should just buy the stock in that case. Feel good that I at least understand the limits of options. Also interesting that really options strategies should focus on buying options for stocks that will have an expected increase in volatility and marketing making for stocks that have high volatility. Real volatility tends to be lower than implied volatility to allow for risk to the market maker BUT the point is to identify the outliers on each side and trade as such.” All of this makes intuitive sense, but let’s crunch the numbers together…
-
Maximum entropy coin toss problem
Inspired by a tweet from John Carlos Baez: Suppose that you roll a die many times and learn that the average value is 5. What is the most likely (i.e., maximum entropy) distribution of the probabilities? This can be expressed as a simple constrained optimization…
-
Generating 3d-designs with OpenCASCADE Link
During the COVID-19 pandemic, I built a Prusa MK3S+ FDM 3d-printer from a kit. After printing the requisite pre-made objects, I started designing my own objects. Although I’ve more recently moved to doing a lot of work in Autodesk Fusion360 (I cranked through Kevin Kennedy’s youtube tutorial over Christmas break), it is always fun to do algorithmically-designed objects–especially since you are printing each object on-demand. My project here was to design a generative-designed mail caddy, where the size and placement of holes is completely random. By using circles, it can be printed without supports, while reducing the amount of filament. Also it looks cool. To do this, I used the OpenCASCADE Link in Mathematica…
-
Kelly Optimal Betting With Discrete Data
I was intrigued by an article on John Parkhill’s blog about the discrete version of the Kelly Criterion. The core idea is that one uses a series of samples of returns and then determines a Kelly-optimal strategy for those samples; the advantage of working with discrete samples (rather than “probabilities” deduced from such data) is that it captures realistic correlations. The Kelly optimal bet vector, $c_k$, (i.e., the fraction of your wealth to wager on each asset k) is found by minimizing the following loss function…
-
Bezier the Bookie
Two gamblers go to a bookie to make a bet: Each pays a $1 to play a game in which they flip two coins. If both are tails, then player A wins, and receives $1.9 (his net return is +$0.9). If both are heads, then player B wins the $1.9. In the case of one head and one tail, there’s no clear winner, so both are out $1. How do the payoffs behave as a function as the probability of the coin goes from p(Heads) = 0 to 1? This straightforward probability problem has a nice (and perhaps unexpected) interpretation in terms of Bézier curves, which are more familiar in the context of computer graphics…
-
Von Neumann Poker
In Von Neumann poker, each player picks a random uniform variable X & Y drawn from [0,1]. Each player pays an ante of $1. Whoever has the higher hand wins. Each player sees only his own hand. What’s the optimal betting (and bluffing) strategy?
-
Hello World, Or How I Started The Blog
Some useful tools and tricks learned while trying to make this blog:
- Creating this blog with Github pages
- Using M2MD to facilitate conversion of Mathematica notebooks into Markdown
- (Although tempting to use Wolfdown, there are a bunch of hardcoded options which were harder to parse)
- How to add tags? Ordinarily Jekyll requires some add ons, but this post describes how to do it with default github pages. I didn’t have much luck, but one day I’ll figure it out. Here’s a step by step instruction guide to adding this information that I will try one day
- Funny things with double squiggly-braces. You may get an
Error: Liquid syntax error (line 35): Variable \{\{-1, 0.9\} was not properly terminated
type error when exporting mathematica lists. This is because Jekyll uses double squigglies for commands…even when these are inside verbatim code cells. Just add a space and you’ll be fine. Alternatively you can wrap these inside … to avoid this. It might be a good idea to do this for code cells in general. I’ll consider modifying theToJekyll
function to do this - How to link between different posts in Jekyll. This article is also quite helpful in describing the differences between different linking styles and modifying the configuration.
- When including multi-line markdown code lines, it is useful to indicate the programming language after the triple-backtick that opens a code block. I’ve also found that the Jekyll processor can misinterpret
#
characters inside code blocks unless there is a newline before and after the start/end of the triple backtick. Easily resolved, but something to keep in mind.
-
Demonstration of Jekyll Export from Mathematica
Our goal here is to demonstrate how to Export Mathematica notebooks to Jekyll markdown using MDExport. This requires a bit of custom code, demonstrated below, which we’ll use to define a function that automates the process.