BRN (ASX) TENNS

Originally posted by Fact Finder: ↑

“Mamba - The Next Evolution in AI Language Processing
Mamba represents a groundbreaking advancement in the field of language modelling, in many cases surpassing the capabilities of traditional Transformer models, with remarkable speed and cost-effectiveness.

What is Mamba?
Mamba is a new method for modelling sequences (like sentences in a language or steps in a process) in a more efficient way compared to current deep learning models.
Recurrent Neural Networks used to be a popular way of modelling text because of their ability to understand sequences, but they became obsolete after 2017 when Transformers were invented. Today large language models all use the Transformer Architecture. Transformers are a type of model that's really good at handling sequences, as they can look at all parts of a sequence at once to understand the context (Self Attentionis key here - see our Blog onAI Large Language Models). However, they struggle with very long sequences because they require a lot of computing power and memory and so can be slow and computationally heavy.
Mamba is designed to work well with both short and long sequences by using a technique called "selective state space models" (SSMs). This decides which parts of the sequence are important and focuses on them, while ignoring less relevant parts. This is what happens when humans read a long article and only focus on the key sentences to understand the main idea.
‍
Understanding the difference between Transformers and Mamba
A good way to understand the difference between the two architectures is to use the analogy of reading a book.
To understand a specific event in "Harry Potter and the Deathly Hallows", a Transformer model would need to recall an enormous amount of detail from the entire series - every character, spell, and plot twist from the previous six books. This is like a reader having to remember every single aspect of Harry's journey, from his first day at Hogwarts to the intricate details of every Quidditch match, to understand the significance of a specific moment in the final book. Technically, this means Transformers maintain a vast repository of information in their memory to process new text, leading to inefficiency and a cumbersome process.
In contrast, Mamba reads "Harry Potter" more like a human reader. While progressing through the story, key themes and characters, such as Harry's conflict with Voldemort, the significance of the Horcruxes, or Dumbledore's guidance, are retained. However, less pivotal details, like the color of every character's robes or the menu at the Hogwarts feast, are not actively remembered. By adopting this selective memory approach., Mamba focuses on the crucial elements of the story and disregards the minutiae. For example, if 'Harry' is mentioned, Mamba, using its stored context, might predict related concepts like 'wand' or 'Voldemort,' drawing upon the key narrative elements it has 'learned' throughout the series.
‍
How SSMs Work
The fundamental concept behind SSMs involves mapping each element in an input sequence through a 'state space', which can be imagined as an abstract, high-dimensional space. In this space, the transformation of the data is governed by a set of parameters that define how each input influences the next state. These transformations are captured mathematically by a series of matrices, each representing different aspects of the transformation, such as how the state evolves over time (A matrix), how inputs affect the state (B matrix), and how the state is transformed into an output (C matrix).
In summary, SSMs work by moving each element of a sequence through a sophisticated, multi-dimensional space (the state space), where it undergoes a series of transformations based on a set of rules (represented by matrices A, B, and C). These transformations are designed to capture the essence and context of the sequence, allowing for accurate predictions or classifications based on the data. This method is particularly powerful for handling sequences where context and long-range dependencies are crucial for understanding the whole picture.
What this means : ‍
‍Input Sequence and State Space: Imagine you have a sequence of data points, like words in a sentence or notes in a music piece. Structured State Space Models (SSMs) take each of these data points and transform them in a high-dimensional space called the 'state space.' This state space isn't a physical space but an abstract mathematical construct where each dimension can represent different features or aspects of the data
‍
Transformation Governed by Parameters: The way each data point (like a word or note) is transformed within this state space is not random. It's governed by a specific set of parameters. These parameters are like rules that dictate how one point in the sequence influences the next. The idea is that each data point affects the 'state' of the model, and this state carries information that influences how future data points are processed.
Mathematical Representation with Matrices:
A Matrix (State Evolution):This matrix represents how the state itself evolves over time. In simpler terms, it's like a set of rules that determine how the current information in the state space will change or transition to the next moment, independent of the new incoming data. In the analogy this is like the rules determining how the story's context evolves.
B Matrix (Input Influence): This matrix defines how the new input (like the next word in a sentence) affects the state. It captures the way incoming data alters or contributes to the current state of the model. In the analogy this shapes the ongoing narrative
C Matrix (Output Generation): After processing through the state space, the C matrix is used to transform the state into the actual output. This could be a prediction, like the next word in a sentence, or a classification, like identifying the genre of a piece of music. Essentially, the C matrix translates the complex, high-dimensional state back into a meaningful output that we can understand and use. In the analogy this is like creating a conclusion based on the story so far.

Conclusion
In tests, Mamba performed exceptionally well, even better than Transformers in some cases, especially when dealing with really long sequences like in language modelling, audio processing, and genomics (the study of DNA sequences). This makes it a promising tool for a wide range of applications, from understanding human language to analysing complex biological data.
Overall, Mamba represents a significant step forward in efficiently handling and understanding sequences, offering improved performance and speed, particularly for very long sequences.
January 17, 2024”

To understand how far ahead Brainchip is thanks to TENNS you need to read the above together with the comparison with TENNS in the table in my previous post.

My opinion only DYOR

Fact Finder

Expand

This was something I posted early Mar on Mamba only due to picking up link with Rudy at BRN playing with it.

Saw a couple of our employees "liking" mamba and wondered what it was.

Couple snips below and curious whether it was a general "like", great development thing or whether it is something they are now working on too.

Is possible the PeaBrane/Mamba-Tiny on Git and liked is something that Rudy created by the looks from the full Mamba but maybe I'm reading it wrong?

Interesting none the less I think given the Ai in 24 links Mamba, Transformers & Neuromorphic all together.

AI in 2024 – On an Exponential Rise: Data, Mamba, and More | YouTube inside
Discover the transformative potential of AI in 2024. Dive into key drivers like data quality and the groundbreaking Mamba architecture.
meta-quantum.today

Mamba: This refers to the emergence of new, ground breaking AI architectures like transformers and neuromorphic computing. These architectures mimic the human brain’s structure and function, allowing for significantly faster processing and deeper learning capabilities. Mamba-based models will revolutionize areas like natural language processing, image recognition, and robotics.

Mamba Architecture: Revolutionizing Sequence Modelling
Mamba, a ground breaking architecture, represents a leap forward from the Transformer models.
It addresses the computational challenges of large-scale sequence processing.
Albert Goo’s work on structured state spaces inspired Mamba’s development.
The architecture’s potential lies in its ability to handle extensive sequences, as demonstrated in DNA classification tasks.
[2312.00752] Mamba: Linear-Time Sequence Modeling with Selective State Spaces

[Submitted on 1 Dec 2023]
Mamba: Linear-Time Sequence Modelling with Selective State Spaces
Albert Gu, Tri Dao
Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements. First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token. Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba). Mamba enjoys fast inference (5× higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences. As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics. On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation.

Keith Johnson - BrainChip | LinkedIn
I have a passion for machine learning and artificial intelligence. I'm interested in… · Experience: BrainChip · Education: The University of Western Australia · Location: Greater Perth Area · 195 connections on LinkedIn. View Keith Johnson’s profile on LinkedIn, a professional community of 1...
au.linkedin.com

https://www.linkedin.com/in/nolan-ardolino-334b83105

Rudy Pei - BrainChip | LinkedIn
I have a passion for ML research and engineering. In particular, I love efficient models… · Experience: BrainChip · Education: University of California San Diego · Location: San Diego · 500+ connections on LinkedIn. View Rudy Pei’s profile on LinkedIn, a professional community of 1 billion members.
www.linkedin.com

Rudy Pei
Physicist | ML researcher | quantum & neuromorphic computing | behavioral economics | composer
3w Edited

Mamba is a new state-space model out-performing transformers on "everywhere tried". Originally, it was trained with associative scan, which pytorch does not support natively, hence the need for custom CUDA kernels. However, there is a simple math trick to express the associative scans used in mamba as a ratio of two cumulative sums. This makes for an efficient native pytorch implementation of mamba possible. How? Check out my simple repo with an one-file implementation of this idea forking from the mamba-minimal repo https://lnkd.in/g5QR7yHC #mamba #llm

GitHub - PeaBrane/mamba-tiny: Simple, minimal implementation of the Mamba SSM in one file of PyTorch. More efficient than the minimalist version but less efficient than the original mamba implementation.
github.com

(20min delay)
Last 20.0¢		Change -0.005(2.44%)		Mkt cap ! $371.1M

Open	High	Low	Value	Volume
20.5¢	20.5¢	19.5¢	$843.2K	4.220M

No.	Vol.	Price($)
4	61948	0.200
27	1584851	0.195
39	2156074	0.190
57	2111342	0.185
54	1788124	0.180

Price($)	Vol.	No.
0.205	629069	14
0.210	653969	10
0.215	362171	9
0.220	888678	24
0.225	453731	11

Keith Johnson - BrainChip | LinkedInI have a passion for machine...

AI in 2024 – On an Exponential Rise: Data, Mamba, and More | YouTube inside

Mamba: Linear-Time Sequence Modelling with Selective State Spaces

Keith Johnson - BrainChip | LinkedIn

Rudy Pei - BrainChip | LinkedIn

GitHub - PeaBrane/mamba-tiny: Simple, minimal implementation of the Mamba SSM in one file of PyTorch. More efficient than the minimalist version but less efficient than the original mamba implementation.

github.com

Nova locks in NASDAQ US market listing at a value of only US$3.3M as ASX gets quieter and quieter

Tissue Repairs soars nearly 32% on TGA approval for wound healing gel

ASX Market Open: ASX200 to rise for Friday session | July 26

IGO kicks off earn-in copper drilling on-site Encounter's Yeneena play as it adopts new identity

MinRes jumps 5% as the company meets production guidance – depressed iron ore prices be damned

Horizon firms up ore reserve for Boorara ahead of FY25 Q1 development decision

Monadelphous inks $200M contract with Woodside to help build Pluto LNG – but will it make Scarborough cheaper?

Red Hawk upgrades Blacksmith with DSO mineral resource at the Eagle Deposit

Capital raisings wrap: Brainchip, Ovanti, Lake and VHM

Nine's staff strike as Paris Olympic coverage begins

Featured News

Buyers (Bids)

Sellers (Offers)

Featured News

Keith Johnson - BrainChip | LinkedInI have a passion for machine...

AI in 2024 – On an Exponential Rise: Data, Mamba, and More | YouTube inside

Mamba: Linear-Time Sequence Modelling with Selective State Spaces

Keith Johnson - BrainChip | LinkedIn

Rudy Pei - BrainChip | LinkedIn

GitHub - PeaBrane/mamba-tiny: Simple, minimal implementation of the Mamba SSM in one file of PyTorch. More efficient than the minimalist version but less efficient than the original mamba implementation.

github.com

Top Stories

Nova locks in NASDAQ US market listing at a value of only US$3.3M as ASX gets quieter and quieter

Tissue Repairs soars nearly 32% on TGA approval for wound healing gel

ASX Market Open: ASX200 to rise for Friday session | July 26

IGO kicks off earn-in copper drilling on-site Encounter's Yeneena play as it adopts new identity

MinRes jumps 5% as the company meets production guidance – depressed iron ore prices be damned

Horizon firms up ore reserve for Boorara ahead of FY25 Q1 development decision

Monadelphous inks $200M contract with Woodside to help build Pluto LNG – but will it make Scarborough cheaper?

Red Hawk upgrades Blacksmith with DSO mineral resource at the Eagle Deposit

Capital raisings wrap: Brainchip, Ovanti, Lake and VHM

Nine's staff strike as Paris Olympic coverage begins

Featured News

Buyers (Bids)

Sellers (Offers)

Featured News