GPT-2 displays a broad set of capabilities, including the ability to generate conditional synthetic text samples of unprecedented quality, where we prime the model with an input and have it generate a lengthy continuation. In addition, GPT-2 outperforms other language models trained on specific domains (like Wikipedia, news, or books) without needing to use these domain-specific training datasets. On language tasks like question answering, reading comprehension, summarization, and translation, GPT-2 begins to learn these tasks from the raw text, using no task-specific training data. While scores on these downstream tasks are far from state-of-the-art, they suggest that the tasks can benefit from unsupervised techniques, given sufficient (unlabeled) data and compute.
Posts tagged text
Distribution of body parts across narrative time in 30k novels.
The evolution of emoji is impressive and fascinating, but it makes for an uncomfortable contrast when other pictorial writing systems – the most commonly-used writing systems on the planet – are on the chopping block. We have an unambiguous, cross-platform way to represent “PILE OF POO” (💩), while we’re still debating which of the 1.2 billion native Chinese speakers deserve to spell their own names correctly.
L1027777 (via http://flic.kr/p/Nsg1v5 )
L1027543 (via http://flic.kr/p/NKc2iV )
Text summarization problem has many useful applications. If you run a website, you can create titles and short summaries for user generated content. If you want to read a lot of articles and don’t have time to do that, your virtual assistant can summarize main points from these articles for you. It is not an easy problem to solve. There are multiple approaches, including various supervised and unsupervised algorithms. Some algorithms rank the importance of sentences within the text and then construct a summary out of important sentences, others are end-to-end generative models. End-to-end machine learning algorithms are interesting to try. After all, end-to-end algorithms demonstrate good results in other areas, like image recognition, speech recognition, language translation, and even question-answering.
Purposefully illegible text from 12th & 21st centuries: A is unreadable by humans; B unreadable by digital scanners
Archaeologists working in Serbia have discovered tiny parchments of gold and silver inscribed with what appears to be a series of ancient curses. The curse tablets were found alongside human skeletons at an excavation site at the foot of a coal-fired power station in Kostolac in northeastern Serbia. Archaeologists led by Miomir Korać are currently scouring the area in preparation for further construction at the site, which was once home to the ancient Roman city of Viminacium. One of the newly discovered scrolls contains text written in ancient Aramaic, and not Greek. That presents a mystery to the scientists, but it’s also an important clue. The researchers have identified several demons associated with the territory of what is today Syria, including Baal, Yahweh, Thobarabau, Seneseilam, and Sesengenfaranges. Invoking the powers of both Baal and Yahweh on a single tablet is unprecedented.
NEVIR (via http://flic.kr/p/y73FFP )
a Guest + a Host = Ghost
(via http://flic.kr/p/ttGi2j )
Editing Finnegans Wake
Looking through the works, you see artists sifting through enormous accumulations of images and texts. They do it in various ways—hunting, grabbing, compiling, publishing. They enact a kind of performance with the data, between the web and the printed page, negotiating vast piles of existing material. Almost all of the artists here use the search engine, in one form or another, for navigation and discovery.
The Groningen Meaning Bank consists of public domain English texts with corresponding syntactic and semantic representations.
The Descriptive Camera works a lot like a regular camera—point it at subject and press the shutter button to capture the scene. However, instead of producing an image, this prototype outputs a text description of the scene. Modern digital cameras capture gobs of parsable metadata about photos such as the camera’s settings, the location of the photo, the date, and time, but they don’t output any information about the content of the photo. The Descriptive Camera only outputs the metadata about the content.
joyce_150110_0016.jpg (967×721) (via http://www.houyhnhnmpress.com/wp-content/gallery/prospectus/joyce_150110_0016.jpg)