GPT-2 displays a broad set of capabilities, including the ability to generate conditional synthetic text samples of unprecedented quality, where we prime the model with an input and have it generate a lengthy continuation. In addition, GPT-2 outperforms other language models trained on specific domains (like Wikipedia, news, or books) without needing to use these domain-specific training datasets. On language tasks like question answering, reading comprehension, summarization, and translation, GPT-2 begins to learn these tasks from the raw text, using no task-specific training data. While scores on these downstream tasks are far from state-of-the-art, they suggest that the tasks can benefit from unsupervised techniques, given sufficient (unlabeled) data and compute.
The evolution of emoji is impressive and fascinating, but it makes for an uncomfortable contrast when other pictorial writing systems – the most commonly-used writing systems on the planet – are on the chopping block. We have an unambiguous, cross-platform way to represent “PILE OF POO” (💩), while we’re still debating which of the 1.2 billion native Chinese speakers deserve to spell their own names correctly.
Text summarization problem has many useful applications. If you run a website, you can create titles and short summaries for user generated content. If you want to read a lot of articles and don’t have time to do that, your virtual assistant can summarize main points from these articles for you. It is not an easy problem to solve. There are multiple approaches, including various supervised and unsupervised algorithms. Some algorithms rank the importance of sentences within the text and then construct a summary out of important sentences, others are end-to-end generative models. End-to-end machine learning algorithms are interesting to try. After all, end-to-end algorithms demonstrate good results in other areas, like image recognition, speech recognition, language translation, and even question-answering.
Archaeologists working in Serbia have discovered tiny parchments of gold and silver inscribed with what appears to be a series of ancient curses. The curse tablets were found alongside human skeletons at an excavation site at the foot of a coal-fired power station in Kostolac in northeastern Serbia. Archaeologists led by Miomir Korać are currently scouring the area in preparation for further construction at the site, which was once home to the ancient Roman city of Viminacium. One of the newly discovered scrolls contains text written in ancient Aramaic, and not Greek. That presents a mystery to the scientists, but it’s also an important clue. The researchers have identified several demons associated with the territory of what is today Syria, including Baal, Yahweh, Thobarabau, Seneseilam, and Sesengenfaranges. Invoking the powers of both Baal and Yahweh on a single tablet is unprecedented.
Looking through the works, you see artists sifting through enormous accumulations of images and texts. They do it in various ways—hunting, grabbing, compiling, publishing. They enact a kind of performance with the data, between the web and the printed page, negotiating vast piles of existing material. Almost all of the artists here use the search engine, in one form or another, for navigation and discovery.