Q&A
The Future of Applied Artificial Intelligence
Software and consulting expert Andreas Erben pulls back the curtain to imagine a world where generative AI technologies reach further than today's capabilities.
The arrival of large language models (LLM) and gen AI has already changed how we work, play and learn in its relatively short time. With capabilities seemingly changing and unlocking on a daily basis, we're all learning as we go and adapting to what the next update to LLMs allow us to do.
While many in the space are looking to what will be possible next week, there are some, like Andreas Erben, CEO of Americas daenet, who has his eyes set on what our AI landscape looks like in years to come. He sees a future where generative AI and other AI-based technologies go far beyond today's capabilities, and is ready to share some of his insight next month at his Live! 360 session, "Beyond ChatGPT - Imagine Your Mind Blown."
Ahead of his highly anticipated talk, Erben sat down with Redmond to give his perspective of how the landscape has evolved, and how artificial intelligence could empower you in new ways as we move forward.
Redmond: What kind of computational resources or technical expertise are needed to run these advanced AI models and is it feasible for small businesses or individual developers?
Erben: Many gen AI models, specifically LLMs such as ChatGPT/GPT-4 by OpenAI are made available exclusively as cloud services. Meaning, the technical expertise needed to run such a model in those cases is very low. With a few lines of example code almost anybody can start leveraging those models. The barrier of entry is extremely low.
That said, when using those services at scale, the technical expertise needed is similar to other areas where someone needs to account for availability constraints or bandwidth constraints of cloud resources when implementing parallel multi-user access, for example.
Some models are available as downloadable model-weights, so theoretically anybody could run them on their own infrastructure, and that is relevant for businesses that face restrictions or requirements that do not allow them to use a cloud service to send possibly sensitive data to. In those cases, it is relevant to understand GPU infrastructures from the hardware necessary, to the operating system and the development tool chain -- typically Linux based systems are a given here. This can be a daunting task when shooting for the most capable models. Due to advancements in model optimization and quantization, it is possible to run relatively capable models even on a powerful laptop computer or on a computer with a single powerful gaming GPU.
With the increasing capabilities of these AI models, what are some of the ethical challenges you still see enterprises wrestling with?
On a higher level, enterprises need to be open to think about novel types of ethical challenges being an area to be continuously mindful about. There will be many potentially disruptive and unanticipated challenges coming up.
On a more concrete level, it's challenging ethically how you balance the responsibilities to your shareholders, to your customers, to your employees, to society. From a pure shareholder value perspective, it may sound appealing to replace some types of information worker jobs with AI assistants managed by significantly less head count or even with fully AI driven agents. But you could shake up the work environment in negative ways doing this with disregard to the affected employees.
Our customers are also very concerned about so-called "hallucinations," when the model generates an output that is simply untrue that reads very convincing and expresses confidence.
Could you discuss a real-world or case studies where the deployment of AI models has led to significant business advantages?
The most obvious real-world case is the increased developer productivity using GitHub CoPilot.
We are implementing improvements to discover information based on context and semantics in our organization ourselves. It helps our organization to break through historically grown silos, as they can more easily access and process information. Think of it as your own librarian/archivist/information researcher able to compile complex information at the push of a button. Some information-based tasks that would cost hours or could even be almost impossible to complete, can now be completed in minutes, in some cases in seconds.
What is the multi-modal models and what industries or specific products have you seen where multi-modal models are making a significant impact?
A multi-modal model or "modality" is a type of input data. Say text, images, or audio for example. An LLM only knows text in a defined alphabet. Multi-modality mixes different input data into one model. Microsoft's "Florence" model is one of those examples, but more publicly known is now GPT-4V (GPT-4 with Vision capabilities). What you can do precisely with a multi-modal model depends on many factors.
One of the most common use cases for a vision model (text and images) is to be able to ask questions about an image to semantically better understand an image. Then running specialty tasks such as background removal, edge detection, object detection or image recognition are other examples of what used to require individual specialty models for each so-called "task" – it's now just a way of utilizing a multi-modal model.
But you could also imagine that you could manipulate an image based on textual instructions. App storage optimization prompting is currently mostly a verbal task, but you already see "image prompting" where someone would, for example, circle a relevant part of an image, then asks a question about the circled part.
Multimodal models are already used today by cloud-scale companies to provide specific services, such as computer vision capabilities, where the specific-purpose-built backend has been replaced or augmented by a multi-modal model. For end-customers, people are starting to learn about GPT-4V's capabilities only just now. It is relevant to keep an eye on what the research community reports about new capabilities, as those will make it into applicable solutions nowadays often within weeks if not already used by select groups.
Big picture, what do you think is the next big step for AI in terms of capabilities and applications?
On the adoption side, the next big step in terms of applications is prevalent integration in existing products and solutions. Most companies will add AI capabilities. As for capabilities of AI itself, and applications, we will see both larger AI models and smaller, optimized AI models. Multi-modal models are one of those trends where models will get larger, include possibly Mixture-of-Experts for scaling purposes, but also find novel ways to make different media types interact.
What AI providers have not yet unlocked for end-users are full multi-modal capabilities, regarding access to the base model, to let anybody discover new capabilities or skills that may be "hidden" in the model. What I mean by this: You can prompt an LLM to write poetry, to summarize, to organize information -- all those are skills and capabilities. We do not know yet what new capabilities may be found when you would combine, say, text, vision, speech and video. Would you be able to generate a descriptive audio matching the cuts in a video? Could you modify an existing video, moving a dialogue between two individuals at a cocktail bar in New York, to a different setting and adjusting the conversation during ice-fishing while preserving the context? Or possibly do things that may even seem completely alien to us? The precise capabilities that will be found are difficult to guess. A few months ago, I would have thought we were years away to generate realistic sounding music with a matching rap or singing voice.
But at the same time we see a motion to try to "shrink" models as computational resources are limited, demand for AI is extensive and there are indications that you can create some capable AI models when you have better training data. Extrapolate a bit and you will see some capable models even running on mobile devices such as smart phones with optimized processors.
Finally, we will see more and more efforts to give AI models agency to do things instead of only creating output that does not interact with the wider environment. How much autonomy should an AI agent have? What does it mean if an agent can create program code and execute it? Could it potentially change its own parameters? What if it is connected to a robot and a sensing system?