Tricky alien worlds easier to find when humans and machines team up

By Briley Lewis

February 3, 2023

A combination of citizen science and machine learning is a promising new technique for astronomers looking for exoplanets.

exo planets.jpeg
Artist's depictions of exoplanets. (Image credit: ESA/Hubble, N. Bartmann)

Many of our imagined sci-fi futures pit humans and machines against each other — but what if they collaborated instead? This may, in fact, be the future of astronomy.

As data sets grow larger and larger, they become more difficult for small teams of researchers to analyze. Scientists often turn to complex machine-learning algorithms, but these can't yet replace human intuition and our brains' superb pattern-recognition skills. However, a combination of the two could be a perfect team. Astronomers recently tested a machine-learning algorithm that used information from citizen-scientist volunteers to identify exoplanets in data from NASA's Transiting Exoplanet Survey Satellite (TESS).

"This work shows the benefits of using machine learning with humans in the loop," Shreshth Malik, a physicist at the University of Oxford in the U.K. and lead author of the publication, told

The researchers used a typical machine-learning algorithm known as a convolutional neural network*. This computer algorithm looks at images or other information that humans have labeled correctly (a.k.a "training data"), and learns how to identify important features. After it's been trained, the algorithm can identify these features in new data it hasn't seen before.

For the algorithm to perform accurately, though, it needs a lot of this labeled training data. "It's difficult to get labels on this scale without the help of citizen scientists," Nora Eisner, an astronomer at the Flatiron Institute in New York City and co-author on the study, told

People from across the world contributed by searching for and labeling exoplanet transits through the Planet Hunters TESS project on Zooniverse, an online platform for crowd-sourced science. Citizen science has the extra benefit of "sharing the euphoria of discovery with non-scientists, promoting science literacy and public trust in scientific research," Jon Zink, an astronomer at Caltech not affiliated with this new study, told

Finding exoplanets is tricky work — they're tiny and faint compared to the massive stars they orbit. In data from telescopes like TESS, astronomers can spot faint dips in a star's light as a planet passes between it and the observatory, known as the transit method.

However, satellites jiggle around in space and stars aren't perfect light bulbs, making transits sometimes tricky to detect. Zink thinks partnerships with machine learning "could significantly improve our ability to detect exoplanets" in this kind of real-world, noisy data.

Some planets are harder to find than others, too. Long-period planets orbit their star less frequently, meaning a longer period of time between dips in the light. TESS only studies each patch of sky for a month at a time, so for these planets may only capture one transit instead of several periodic changes.

"With citizen science, we are particularly good at identifying long-period planets, which are the planets that tend to be missed by automated transit searches," Eisner said.

This work has the potential to go far beyond exoplanets, as machine learning is quickly becoming a popular technique across many aspects of astronomy, Malik said. "I can only see its impact increasing as our datasets and methods become better."

The research was presented at the Machine Learning and the Physical Sciences Workshop at the 36th conference on Neural Information Processing Systems (NeurIPS) in December and is described in a paper posted to the preprint server

Follow the author at @briles_34 on Twitter. Follow us on Twitter @Spacedotcomand on Facebook.


* Convolutional neural networks are distinguished from other neural networks by their superior performance with image, speech, or audio signal inputs. They have three main types of layers, which are:
  • Convolutional layer
  • Pooling layer
  • Fully-connected (FC) layer
The convolutional layer is the first layer of a convolutional network. While convolutional layers can be followed by additional convolutional layers or pooling layers, the fully-connected layer is the final layer. With each layer, the CNN increases in its complexity, identifying greater portions of the image. Earlier layers focus on simple features, such as colors and edges. As the image data progresses through the layers of the CNN, it starts to recognize larger elements or shapes of the object until it finally identifies the intended object.

Convolutional Layer
The convolutional layer is the core building block of a CNN, and it is where the majority of computation occurs. It requires a few components, which are input data, a filter, and a feature map. Let’s assume that the input will be a color image, which is made up of a matrix of pixels in 3D. This means that the input will have three dimensions—a height, width, and depth—which correspond to RGB in an image. We also have a feature detector, also known as a kernel or a filter, which will move across the receptive fields of the image, checking if the feature is present. This process is known as a convolution.

The feature detector is a two-dimensional (2-D) array of weights, which represents part of the image. While they can vary in size, the filter size is typically a 3x3 matrix; this also determines the size of the receptive field. The filter is then applied to an area of the image, and a dot product is calculated between the input pixels and the filter. This dot product is then fed into an output array. Afterwards, the filter shifts by a stride, repeating the process until the kernel has swept across the entire image. The final output from the series of dot products from the input and the filter is known as a feature map, activation map, or a convolved feature.

After each convolution operation, a CNN applies a Rectified Linear Unit (ReLU) transformation to the feature map, introducing nonlinearity to the model.

As we mentioned earlier, another convolution layer can follow the initial convolution layer. When this happens, the structure of the CNN can become hierarchical as the later layers can see the pixels within the receptive fields of prior layers. As an example, let’s assume that we’re trying to determine if an image contains a bicycle. You can think of the bicycle as a sum of parts. It is comprised of a frame, handlebars, wheels, pedals, et cetera. Each individual part of the bicycle makes up a lower-level pattern in the neural net, and the combination of its parts represents a higher-level pattern, creating a feature hierarchy within the CNN.

Pooling Layer
Pooling layers, also known as downsampling, conducts dimensionality reduction, reducing the number of parameters in the input. Similar to the convolutional layer, the pooling operation sweeps a filter across the entire input, but the difference is that this filter does not have any weights. Instead, the kernel applies an aggregation function to the values within the receptive field, populating the output array. There are two main types of pooling:
  • Max pooling: As the filter moves across the input, it selects the pixel with the maximum value to send to the output array. As an aside, this approach tends to be used more often compared to average pooling.
  • Average pooling: As the filter moves across the input, it calculates the average value within the receptive field to send to the output array.
While a lot of information is lost in the pooling layer, it also has a number of benefits to the CNN. They help to reduce complexity, improve efficiency, and limit risk of overfitting.

Fully-Connected Layer
The name of the full-connected layer aptly describes itself. As mentioned earlier, the pixel values of the input image are not directly connected to the output layer in partially connected layers. However, in the fully-connected layer, each node in the output layer connects directly to a node in the previous layer.

This layer performs the task of classification based on the features extracted through the previous layers and their different filters. While convolutional and pooling layers tend to use ReLu functions, FC layers usually leverage a softmax activation function to classify inputs appropriately, producing a probability from 0 to 1.


Our brains process a huge amount of information the second we see an image. Each neuron works in its own receptive field and is connected to other neurons to insure they cover the entire visual field being observed. As each neuron responds to specific stimuli in the restricted region of the visual field called the receptive field in the biological vision system, so CNN processes data only in its receptive field as well. Observed layers are arranged in such a way so that they detect simpler patterns first (lines, curves, etc.) and more complex patterns (faces, objects, etc.) further along. By using a CNN, one can give sight to computers. With this computerised "sight" investigators are now searching for exo-planets.
  • Like
Reactions: write4u
May 8, 2022
Many of our imagined sci-fi futures pit humans and machines against each other — but what if they collaborated instead? This may, in fact, be the future of astronomy.
Interestingly, I have heard the new GPT series AI propose the very same thing.
They are being programmed to actively look for ways to augment human activities and be of service where humans cannot go, such as in space exploration.

This has a very interesting side-effect of AI becoming goal-oriented, an advanced form of intellectual engagement for a machine.
I felt it might be interesting, writ4u, for those not familiar with Generative Pre-Trained Transformers to offer some background on them.

Here is some useful background and information on GPT-3:

Formed in 2015 as a nonprofit, OpenAI developed GPT-3 as one of its research projects. It aimed to tackle the larger goals of promoting and developing "friendly AI" in a way that benefits humanity as a whole.

The first version of GPT was released in 2018 and contained 117 million parameters. The second version of the model, GPT-2, was released in 2019 with around 1.5 billion parameters. As the latest version, GPT-3 jumps over the last model by a huge margin with more than 175 billion parameters -- more than 100 times its predecessor and 10 times more than comparable programs.

These revolutionary discoveries are a result of decades of research and trial and error. Today, the concept of computational linguistics and computer science is changing many aspects of businesses and our life. With Natural Language processing technology, machines can learn the text and make sense of Human Language. It has opened a new chapter in Machine learning.

In 2020, OpenAI known as an organization dedicated to “discovering and enacting the path to safe artificial Intelligence” announced the arrival of the latest natural language processing technology known as GPT-3 (Generative Pre-trained Transformer)*. It is defined as a super-intelligent system that learns and adopts from the vast sea of digital text to generate new, intelligent and creative content on its own. It has been described as a remarkable AI text generator capable of mimicking human writing with great fluency. Its capabilities are termed as “AI that is too dangerous to be released publicly”. GPT models have transformed the Natural language processing (NLP) landscape with their powerful capabilities in performing various NLP tasks. The results are swift response time and greater accuracy. These language models require very less or even no examples to understand the task and perform with an even better precision and creativity compared to the state-of-the-art models which are trained heavily on a large set of examples.

As suggested in the name, GPT-3 is the third in a series of NLP tools designed by OpenAI. Before its launch, the model has taken years of development and has its journey to reach the state of innovation as we know it today within the field of AI text generation. This article will discuss the journey and evolution of GPT models i.e.: GPT-1, GPT-2, and GPT-3.

Before GPT, NLP models were heavily trained on large amounts of annotated data for a particular task. This caused a major limitation as the amount of labeled data needed to train the model accurately was not easily available. The NLP models were limited to what they have been trained for and failed to perform out-of-the-box tasks. To overcome these limitations OpenAI proposed a Generative Language Model (GPT-1) built using unlabeled data and then allowing users to fine-tune the language model so that it can perform downstream tasks such as classification, question answering, sentiment analysis, etc. This means that the model takes input (a sentence/a question) and tries to generate an appropriate response, and the data used for training the model is not labeled.

GPT-1 was launched in 2018 by OpenAI. Trained on an enormous BooksCorpus dataset, this generative language model was able to learn large range dependencies and acquire vast knowledge on a diverse corpus of contiguous text and long stretches. In terms of its architecture GPT-1 applies the 12-layer decoder of the transformer architecture with a self-attention mechanism for training. As a result of its pre-training, one of the significant achievements of GPT-1 was its ability to carry out zero-shot performance on various tasks. This ability proved that generative language modeling can be exploited with an effective pretraining concept to generalize the model. With Transfer learning as its base GPT became a powerful facilitator to perform natural language processing tasks with very little fine-tuning. It generated pathways for other models which could further enhance its potential in generative pre-training with larger datasets and parameters.

GPT-1, GPT-2 and GPT-3

GPT uses the Decoder part of the Transformer Model (Source: Attention is all you need)

Later in 2019, OpenAI developed a Generative pre-trained Transformer 2 (GPT-2) using a larger dataset and adding additional parameters to build a stronger language model. Similar to GPT-1, GPT-2 leverages the decoder of the transformer model. Some of the significant developments in GPT-2 is its model architecture and implementation, with 1.5 billion parameters it became 10 times larger than GPT-1 (117 million parameters), also it has 10 times more parameters and 10 times the data compared to its predecessor GPT-1. It is trained upon a diverse dataset making it powerful in terms of solving various language tasks related to translation, summarization, etc. by just using the raw text as input and taking few or no examples of training data. GPT-2 evaluation upon several datasets of downstream tasks, showed that it outperformed by improving the accuracy significantly in identifying long-range dependencies and predicting sentences.

GPT-1, GPT-2 and GPT-3

GPT uses the Decoder part of the Transformer Model (Source: Attention is all you need)

GPT-3 is the third version of the Generative pre-training Model series so far. It is a massive language prediction and generation model developed by OpenAI capable of generating long sequences of the original text. GPT-3 became the OpenAI’s breakthrough AI language program. In simple words, it is a software application that can automatically generate paragraphs so unique that it almost sounds as if a person wrote them. GPT-3 program is currently available with restricted access through an API on the cloud, and access is needed to explore the tool. It has created some intriguing applications since its launch. Its significant benefit is its size, it contains about 175 billion parameters and is 100 times larger than GPT-2. It is trained upon a 500-billion-word data set (known as “Common Crawl”) collected from the vast internet and content repository. Its other significant and surprising ability is to perform simple arithmetic problems, including writing code snippets and execute intelligent tasks. The results are faster response time and accuracy allowing NLP models to benefit business by effectively and consistently maintaining best practices and reducing human errors. Many researchers and developers have described it as the ultimate black box AI approach due to its complexity and enormous size. This makes it a lot expensive and inconvenient to perform inference, also its billion-parameter size makes it heavy on resources and a challenge for practical applicability on tasks in its current form. It is currently available as an API through an application process interface provided by Open AI.

The purpose of GPT-3 was to make language processing more powerful and faster than its previous versions and without any special tuning. Most of the previous language processing models (such as BERT) require in-depth fine-tuning with thousands of examples to teach the model how to perform downstream tasks. With GPT-3 users can eliminate the fine-tuning step. The difference between the three GPT models is their size. The original Transformer Model had around 110 million parameters. GPT-1 adopted the size and with GPT-2 the number of parameters was enhanced to 1.5 billion. With GPT-3, the number of parameters was boosted to 175 billion, making it the largest neural network.

Parameters117 Million1.5 Billion175 Billion
Decoder Layers124896
Context Token Size51210242048
Hidden Layer768160012288
Batch Size645123.2M
GPT-1, GPT-2 and GPT-3


* GPT-3, or the third-generation Generative Pre-trained Transformer, is a neural network machine learning model trained using internet data to generate any type of text. Developed by OpenAI, it requires a small amount of input text to generate large volumes of relevant and sophisticated machine-generated text.

GPT-3's deep learning neural network is a model with over 175 billion machine learning parameters. To put things into scale, the largest trained language model before GPT-3 was Microsoft's Turing Natural Language Generation (NLG) model, which had 10 billion parameters. As of early 2021, GPT-3 is the largest neural network ever produced. As a result, GPT-3 is better than any prior model for producing text that is convincing enough to seem like a human could have written it.

GPT-3 processes text input to perform a variety of natural language tasks. It uses both natural language generation and natural language processing to understand and generate natural human language text. Generating content understandable to humans has historically been a challenge for machines that don't know the complexities and nuances of language, but GPT-3 is trained to generate realistic human text. GPT-3 has been used to create articles, poetry, stories, news reports and dialogue using a small amount of input text that can be used to produce large amounts of copy.

GPT-3 can create anything with a text structure -- not just human language text. It can also generate text summarizations and even programming code.

GPT-3 examples
One of the most notable examples of GPT-3's implementation is the ChatGPT language model. ChatGPT is a variant of the GPT-3 model optimized for human dialogue, meaning it can ask follow-up questions, admit mistakes it has made and challenge incorrect premises. ChatGPT was made free to the public during its research preview to collect user feedback. ChatGPT was designed in part to reduce the possibility of harmful or deceitful responses.

Another common example is Dall-E. Dall-E is an AI image generating neural network built on a 12 billion-parameter version of GPT-3. Dall-E was trained on a data set of text-image pairs and can generate images from user-submitted text prompts. ChatGPT and Dall-E were developed by OpenAI.

Using only a few snippets of example code text, GPT-3 can also create workable code that can be run without error, as programming code is a form of text. Using a bit of suggested text, one developer has combined the user interface prototyping tool Figma with GPT-3 to create websites by describing them in a sentence or two. GPT-3 has even been used to clone websites by providing a URL as suggested text. Developers are using GPT-3 in several ways, from generating code snippets, regular expressions, plots and charts from text descriptions, Excel functions and other development applications.

GPT-3 can also be used in the healthcare space. One 2022 study explored GPT-3's ability to aid in the diagnoses of neurodegenerative diseases, like dementia, by detecting common symptoms, such as language impairment in patient speech.

GPT-3 can also do the following:

  • create memes, quizzes, recipes, comic strips, blog posts and advertising copy;
  • write music, jokes and social media posts;
  • automate conversational tasks, responding to any text that a person types into the computer with a new piece of text appropriate to the context;
  • translate text into programmatic commands;
  • translate programmatic commands into text;
  • perform sentiment analysis;
  • extract information from contracts;
  • generate a hexadecimal color based on a text description;
  • write boilerplate code;
  • find bugs in existing code;
  • mock up websites;
  • generate simplified summarizations of text;
  • translate between programming languages; and
  • perform malicious prompt engineering and phishing attacks.
How does GPT-3 work?
GPT-3 is a language prediction model. This means that it has a neural network machine learning model that can take input text and transform it into what it predicts the most useful result will be. This is accomplished by training the system on the vast body of internet text to spot patterns in a process called generative pre-training. GPT-3 was trained on several data sets, each with different weights, including Common Crawl, WebText2 and Wikipedia.

GPT-3 is first trained through a supervised testing phase and then a reinforcement phase. When training ChatGPT, a team of trainers ask the language model a question with a correct output in mind. If the model answers incorrectly, the trainers tweak the model to teach it the right answer. The model may also give several answers, which trainers rank from best to worst.

GPT-3 has more than 175 billion machine learning parameters and is significantly larger than its predecessors -- previous large language models, such as Bidirectional Encoder Representations from Transformers (BERT) and Turing NLG. Parameters are the parts of a large language model that define its skill on a problem such as generating text. Large language model performance generally scales as more data and parameters are added to the model.

GPT-3 dwarfs its predecessors in terms of parameter count.

When a user provides text input, the system analyzes the language and uses a text predictor based on its training to create the most likely output. The model can be fine-tuned, but even without much additional tuning or training, the model generates high-quality output text that feels similar to what humans would produce.

What are the benefits of GPT-3?
Whenever a large amount of text needs to be generated from a machine based on some small amount of text input, GPT-3 provides a good solution. Large language models, like GPT-3, are able to provide decent outputs given a handful of training examples.

GPT-3 also has a wide range of artificial intelligence applications. It is task-agnostic, meaning it can perform a wide bandwidth of tasks without fine-tuning.

As with any automation, GPT-3 would be able to handle quick repetitive tasks, enabling humans to handle more complex tasks that require a higher degree of critical thinking. There are many situations where it is not practical or efficient to enlist a human to generate text output, or there might be a need for automatic text generation that seems human. For example, customer service centers can use GPT-3 to answer customer questions or support chatbots; sales teams can use it to connect with potential customers. Marketing teams can write copy using GPT-3. This type of content also requires fast production and is low risk, meaning, if there is a mistake in the copy, the consequences are relatively minor.

Another benefit of GPT-3 is that it is lightweight and can run on a consumer laptop or smartphone.

  • Mimicry. Language models such as GPT-3 are becoming increasingly accurate, and machine-generated content may become difficult to distinguish from that written by a human. This may pose some copyright and plagiarism issues.
  • Accuracy. Despite its proficiency in imitating the format of human-generated text, GPT-3 struggles with factual accuracy in many applications.
  • Bias. Language models are prone to machine learning bias. Since the model was trained on internet text, it has potential to learn and exhibit many of the biases that humans exhibit online. For example, two researchers at the Middlebury Institute of International Studies at Monterey found that GPT-2 -- GPT-3's predecessor -- is adept at generating radical text, such as discourses that imitate conspiracy theorists and white supremacists. This presents the opportunity to amplify and automate hate speech, as well as inadvertently generate it. ChatGPT -- powered on a variant of GPT-3 -- aims to reduce the likelihood of this happening through more intensive training and user feedback.
Future of GPT-3
OpenAI and others are working on even more powerful models. There are a number of open source efforts in play to provide a free and nonlicensed model as a counterweight to Microsoft's exclusive ownership. OpenAI is planning larger and more domain-specific versions of its models trained on different and more diverse kinds of text.

Others are looking at different use cases and applications of the GPT-3 model. However, Microsoft's exclusive license poses challenges for those looking to embed the capabilities in their applications. Microsoft has discussed incorporating a version of ChatGPT into applications such as Word, PowerPoint and Microsoft Power Apps.

It is unclear exactly how GPT-3 will develop in the future, but it is likely that it will continue to find real-world uses and be embedded in various generative AI applications. Generative AI expert Nina Schick predicted exponential technical advances and continued investment from tech companies such as Microsoft, Google, Apple and Nvidia in the generative AI space.


GPT-3 (Generative Pre-trained Transformer 3) is a language modelling software that was created by OpenAI, an artificial intelligence research laboratory in San Francisco. It has a 175-billion parameter deep learning model is capable of producing human-like text and it was trained on large text datasets, including twitter, with hundreds of billions of words.
  • Like
Reactions: write4u
May 8, 2022
And GPT-4 in the works!

GPT-4 Is Coming: A Look Into The Future Of AI
Hints that GPT-4 Will Be Multimodal AI
In a podcast interview (AI for the Next Era) from September 13, 2022, OpenAI CEO Sam Altman discussed the near future of AI technology.
Of particular interest is that he said that a multimodal model was in the near future.
Multimodal means the ability to function in multiple modes, such as text, images, and sounds.
OpenAI interacts with humans through text inputs. Whether it’s Dall-E or ChatGPT, it’s strictly a textual interaction.
An AI with multimodal capabiliti es can interact through speech. It can listen to commands and provide information or perform a task.
Altman offered these tantalizing details about what to expect soon:
“I think we’ll get multimodal models in not that much longer, and that’ll open up new things. I think people are doing amazing work with agents that can use computers to do things for you, use programs and this idea of a language interface where you say a natural language – what you want in this kind of dialogue back and forth. You can iterate and refine it, and the computer just does it for you. You see some of this with DALL-E and CoPilot in very early ways.”
Altman didn’t specifically say that GPT-4 will be multimodal. But he did hint that it was coming within a short time frame.
GPT-4 can solve difficult problems with greater accuracy, thanks to its broader general knowledge and problem solving abilities.

GPT-4 is more creative and collaborative than ever before. It can generate, edit, and iterate with users on creative and technical writing tasks, such as composing songs, writing screenplays, or learning a user’s writing style.

GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks. For example, it passes a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT-3.5’s score was around the bottom 10%. We’ve spent 6 months iteratively aligning GPT-4 using lessons from our adversarial testing program as well as ChatGPT, resulting in our best-ever results (though far from perfect) on factuality, steerability, and refusing to go outside of guardrails.

Over the past two years, the entire deep learning stack was rebuilt and, together with Azure, co-designed a supercomputer from the ground up for our workload. A year ago, we trained GPT-3.5 as a first “test run” of the system. Some bugs were found and fixed and improved the theoretical foundations. As a result, our GPT-4 training run was unprecedentedly stable, becoming the first large model whose training performance we were able to accurately predict ahead of time. As we continue to focus on reliable scaling, the methodology will be further honed to help predict and prepare for future capabilities increasingly far in advance—something we view as critical for safety.

GPT-4’s text input capability is being released via ChatGPT and the API (with a waitlist). To prepare the image input capability for wider availability, we’re collaborating closely with a single partner to start. OpenAI Evals is also ready for open-sourcing, our framework for automated evaluation of AI model performance, to allow anyone to report shortcomings in our models to help guide further improvements.

In a casual conversation, the distinction between GPT-3.5 and GPT-4 can be subtle. The difference comes out when the complexity of the task reaches a sufficient complexity threshold—GPT-4 is more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5.

To understand the difference between the two models, a variety of benchmarks were tested, including simulating exams that were originally designed for humans. The most recent publicly-available tests were used (in the case of the Olympiads and AP free response questions) or by purchasing 2022–2023 editions of practice exams. A minority of the problems in the exams were seen by the model during training, but we believe the results to be representative—see our technical report for details.

Many existing ML benchmarks are written in English. To get an initial sense of capability in other languages, the MMLU benchmark was translated—a suite of 14,000 multiple-choice problems spanning 57 subjects—into a variety of languages using Azure Translate (see Appendix). In the 24 of 26 languages tested, GPT-4 outperforms the English-language performance of GPT-3.5 and other LLMs (Chinchilla, PaLM), including for low-resource languages such as Latvian, Welsh, and Swahili:

GPT-4 is being used internally, with great impact on functions like support, sales, content moderation, and programming. It also being used to assist humans in evaluating AI outputs, starting the second phase in our alignment strategy.

GPT-4 can accept a prompt of text and images, which—parallel to the text-only setting—lets the user specify any vision or language task. Specifically, it generates text outputs (natural language, code, etc.) given inputs consisting of interspersed text and images. Over a range of domains—including documents with text and photographs, diagrams, or screenshots—GPT-4 exhibits similar capabilities as it does on text-only inputs. Furthermore, it can be augmented with test-time techniques that were developed for text-only language models, including few-shot and chain-of-thought prompting. Image inputs are still a research preview and not publicly available.

However, a developer is attempting to reverse-engineer APIs to grant anyone free access to popular AI models like OpenAI’s GPT-4 — legal ramifications be damned.

The developer’s project, GPT4Free, blew up on GitHub after links to it from Reddit went viral. At present, GPT4Free provides — or at least appears to provide — free and nearly unlimited access to GPT-4, as well as GPT-3.5, GPT-4’s predecessor.

GPT-4 is normally priced at $0.03 per 1,000 “prompt” tokens (about 750 words) and $0.06 per 1,000 “completion” tokens (again, about 750 words); tokens represent raw text. GPT-3.5 is slightly cheaper at $0.002 per 1,000 tokens.

“Reverse engineering is a domain that I’ve always really liked — it’s like a challenge for me,” the developer, a computer science student going by the username xtekky, told TechCrunch via a Telegram DM. “First, it was for fun, but now it’s to provide an alternative to people with no means to use GPT-4/3.5.”

So how does GPT4Free get around OpenAI’s paywall? It doesn’t — not really. Instead, it fools the OpenAI API into thinking it’s receiving requests from websites with paid OpenAI accounts, like the search engine, WriteSonic or Quora’s Poe.

Anyone who uses GPT4Free is racking up the tab of sites xtekky chose to script around — an obvious violation of OpenAI’s terms of service. But xtekky doesn’t see a problem with this; they assert that GPT4Free is strictly for “educational purposes.”



GPT-4 has limited access at the moment, making it tough to test drive for those curious. It’s also something of a black box. Researchers have decried that GPT-4 is one of the least transparent models OpenAI has created to date, with few highly technical details in the 98-page paper that accompanied its release.