Saturday, July 4, 2009

Oracal

So back in Dec of 2005 I had an idea for a program that would pick random sentences from the internet and attempt to have a conversation with some one online with them. There were a number of emails discussing it, here are all of them:


Sun, Dec 11, 2005 at 5:32 AM

I came up with a crazy program during the course of the week. Basically you give it a search string and it does a google search, then it downloads the first 100 results, and it enters each word from each page into a dictionary, and enters the multiple times if used multiple times, to weight them. Then it creates a random string of those words the length of the avg page length. Basically to create a crazy article about some jargon filled industry thing. I thought about maybe refining it to use whole sentences instead of words, that would get around it not making sense. Although the whole thing is to crazy and pointless for me to actually do.

Here is a sample I manually did, going to the first 10 pages and picking a word at random manually, search string was hd-dvd blu ray compression techniques

this Intel HD DVDs HD DVD however Television Scientific VC-1 the current

And here is the same one with random sentences picked:

Sure people will buy PS3 for gaming. In April 2005, Apple Computer, a member of the DVD Forum, updated DVD Studio Pro to support authoring HD content. Above all else, it's their ability to shoot ordinary 8cm discs that can be immediately taken to almost any DVD player or computer for viewing or editing that makes them so endearing. The answer is pretty simple, money. HD-DVD has a single layer capacity of 15 GB and a dual-layer capacity of 30 GB. MPEG-2 is used in DVD-Video and the first generation of the rewritable version of the Blu-ray Disc format. I'm so tired of watching things in the wrong aspect ratio. They will also include MPEG-2 support for playback of HDTV recordings and DVDs. Like HD-DVD, Blue laser discs don't require a caddy and the players and recorders will be able to play current DVD discs. Toshiba and NEC are the only big consumer electronics names behind HD-DVD, but from the content side they have the support of Universal, Paramount and Warner Bros.

That paragraph almost makes sense I do kind of like it, it's about as coherent as any post on a forum. Which gives me an idea, what about using this program to search for whatever the topic starter writes and then reply with a paragraph generated like this. I'm going to do this right now with the first topic in Tech at TOTSE.

The winning topic is https://www.totse.com/bbs/Forum11/HTML/015674.html Topic: Computer Running Sloooooow.

Here is the paragraph:

O23 - Service: Ati HotKey Poller - ATI Technologies Inc. - C:\WINDOWS\system32\Ati2evxx.exe With our team in place we really needed someone to maintain a reasonable level of sanity in a male-based, tech-crazy work environment. If I have to spend hours just to submit a few auction`s I will just quit selling on eBay. Different background in workspaces? i was once told that the longer u keep ur computer on the more chache memory gets used, and eventually the computer crashes, not sure if its true or not I agree with all the information given by previous users, except that Zone Alarm is a total waste of money and time. The raging seas are equally impressive, but more so the rain spatter effect that strikes the camera - or the occasional spurt of blood from a fractured skull that splats across it instead. After that I tried to install some truetype fonts, but the same problem was happening. Have a look at the live CD and if it likes the video and sound, go for an install. XP Device manager items w/ yellow exclamation marks

Not quite as coherent. But I think I could improve the system a bit. For instance only using sites > x KB (text), really only the pages with large amounts of text gave me good stuff. Next I would do the same for the sentence, must be > x KB. If it was less I'm unsure if I would continue to include the next sentence or pick a new one. Now I think most smart people would realize that it was jibberish, but if you were dumb and didn't know about what you were asking about you might not really understand what was going on. From there I could make the program scan future posts for my username, then again do this using the first sentence from that post. I while some would ignore me, some people would deff argue with it endlessly. Let me also add that I have laughed out load here in the lounge a few time reading that paragraph, mainly the raging seas line.

I could probably make a pretty decent AIM bot with this too. The problem would be what to do the first search for, since I'd have to IM them to start it. I guess I could do their screenname, that would be interesting. Hmmm, screenname didn't work so well, but profile is good. Then I could give them a random sentence from a random site, and keep at it until they give me a line > x Bytes, so that I would have something to search for, and just rinse and repeat. Problem is I can't really test it, it would be hard to manually do the searches as it would be slow, plus I've never really been one for sending AIM bots to people I know.


Fri, Dec 16, 2005 at 6:06 AM

I originally thought of that program during the week as a crazy way to study words. What it really started as was a version of the million monkeys typing Shakespeare thing. I thought about using Word or one of the free Words to grammar check what it outputed and see what kind of sentences it came up with. Then I thought about how it would be pretty rare for it to come up with a word let alone multiple words that flow with each other to give meaning. That's when I came up with using Google to get words that all would have the same idea. The rest I came up with as I was typing it out.

The main problem I can see with it would be the length to download 100 webpages, even on broadband and if you were just downloading the html it would take a while. Then you would need it to strip the html tags. That really wouldn't be that hard. I just looked and you could just really easily just have it start ignoring at <>. I guess the main problem I see now is that I used a bit of intelligence when randomly picking the sentences. Like I said just using a >x bytes for both the page and the sentence. But I'd run into problems with abbreviation, I'll think about it more though now that you've expressed interest in investing in Wetzel Ind's Software R&D Department. I don't program on the laptop and don't have internet on my PC, so that's a bit of a problem.


April 09, 2007 19:50

I've been reading a lot of Usenet/Newsgroup, whatever that is called. Google runs it now or something, either way it's a lot of technical discussion of every subject. I was briefly thinking of using it as the forum for the forum program. I was also doing a lot of thinking about the forum program. It would deff be done if I could program on a computer with internet access. There's tons of Perl libraries for striping HTML of the tags. Plus there's stuff for downloading HTML pages. Those were my two main concerns. Once I have text in a certain format finding the sentences will be easy. The next hardest part will be posting to a forum, I may end up having to manually do that. I'm sure there are ways to do it automatically, but for some reason most forums have protection against bots posting to them.


Wed, Dec 24, 2008 at 7:54 AM

So I was reading a Slashdot article about how a randomly generated paper got accepted to some peer reviewed conference.
http://entertainment.slashdot.org/article.pl?sid=08/12/23/2321242
And along the way some one mentioned how this should qualify as passing the turning test (an AI fooling a person into thinking it was also a person http://en.wikipedia.org/wiki/Turing_test), and their was a reply that in order to pass the test there needs to be a 2 way conversation. This made me think about how my old forum posting program idea would have a pretty good chance of passing a turning test. I thought some more about the main problem of getting formatted text from webpages and came up with the fairly obvious solution of using a command line web browser.

I'm not really sure why this never occurred to me it's the perfect solution to my problem. So I downloaded lynx, the main text web browser, and played around with it. After an hour or so of enjoying find the source over and over but being unable to get a precompiled and configured for windows binary I finally got something that worked. I now had the ability to enter commands like "lynx -dump http://www.google.com/search?q=monkeys >goog.txt" and get output in a text file like this:
...
2. [33]Monkey - Wikipedia, the free encyclopedia
A monkey is any member of either the New World monkeys or Old
World monkeys, two of the three groupings of simian primates, the
third group being the apes. ...
en.wikipedia.org/wiki/Monkey - 54k - [34]Cached - [35]Similar
pages
3. [36]San Diego Zoo's Animal Bytes: Monkey
Get fun and interesting monkey facts in an easy-to-read style from
the San Diego Zoo's Animal Bytes. Buy tickets online and plan a
visit to the Zoo or Wild ...
www.sandiegozoo.org/animalbytes/t-monkey.html - 32k - [37]Cached -
[38]Similar pages
...

Note that lynx renders all the html so the urls to the links aren't there. Luckily google lists the url to each result (also I later learned that at the bottom of the file each of those bracketed numbers has the url of the link).

Now it's getting late, and I haven't actually done any work on the program, but I think there's a pretty decent chance it'll get made. Here's my outline:
A perl program accepts as input a line of text (to be copied by a human from some discussion), possibly taking in a whole paragraph from an input file. Perl creates a google search url for that line, and sends that command to lynx to dump the webresults to a file. Then it opens the web results and sends each of the pages to lynx for dumping to files. Once it has the files with the content of the web pages it opens each and takes anything that looks like a logical sentence and creates an array of them. Then it randomly selects about 10 sentences and outputs them to a final output file. A human has to copy and paste the reply into the forum.

The reason for the user interaction is 1. A lot of forums now have bot protection (CAPTCHA). 2. I don't want to flood any forum. 3. The required copy and pasting isn't very hard. 4. Any automated posting would require forum specific code, this base program will be flexible to any situation. Just input a question (or anything) and it's outputs a response. If I find a good forum that I like I can work on automating the replying in a separate script.

Nothing in the basic outline is unfeasible for me now, so I don't really see any problems with this happening (other than laziness).


Tue, Jan 6, 2009 at 2:49 PM

Well the forum program is done, I must say the second halve went much easier than I thought it would. The bulk of the time on the whole thing has been spent making lynx work right (or figuring out the exact commands it needs to do what I want). I was surprised how well simply using a range for the lines/sentences worked at getting good results. I had a bunch of filters I planned on using but may not bother with now. Currently the biggest problem is getting too few google results (resulting from a whole sentence being too specific). For now I'm just manually cutting the questions short, but I could foresee a solution involving taking the 5 longest words.



So here are some results:
Slashdot:
Question:
S3 (a developer platform) coupled with a consumer front
Answer:
The Scribd API is the most powerful and customizable way to upload documents to Scribd, convert documents to iPaper, manage existing documents on your website, and find the content you need. While the latest syndication format, Atom, might not be RDF, it's good Web-friendly data that can be mapped to RDF (work is in progress on conventions for that). IVA's API allows you to tie our massive catalog of over 20,000 movie trailers, 14,000 music videos, and 2,000 game trailers into your website!. Convert between force units such as Dyne, gram-force, poundals, newtons, pounds, kgm-force, etc. In conversation with members of the Social CRM team (shown at right), it was obvious this group loves what they're doing. ParallelFX runs on .Net FX 3.5 and relies on features available in C# 3.0 and Visual Basic 9.0. Beehive is being used to build Web applciations and for access to databases, Java messaging and EJBs, he said. This access to information, coupled with the lack of offices, created a flat structure where any idea could win the day. Charles' technology interests range broadly across Web Technology to allow for more interesting development to be brought to more people. The SGI InfiniteStorage 4000 continues the tradition of industry-leading storage technology from SGI's InfiniteStorage product line. I expect we'll see refinements and enhancements based on market feedback.

Yahoo Answers:
Question:
Do retail stores close (go out of business) typically in January or February?
Answer:
A link is provided about each store mentioned to verify the information. Com at the moment Costco faces a problem that about 2324%. I don't think it's new, nor do I think its "American envy" (well, that's not new, either). Don't solicit customers and push them, integrate the offering of the service into your advertising and marketing and pull them. Many visitors will already know me as Vice President of Sales and Marketing for Encore Studios www.encorestudios.com. CompUSA (CLOSED) clarifies details on store closings Any extended warranties purchased for products through CompUSA will be honored by a third-party provider, Assurant Solutions. I've seen this report several times and I think it may be overestimating or even over-hyping the ramifications. Some consumers just enjoy going sending out a beautiful holiday card and are willing to spend the money. I flatly refused, and ended the meeting," he says. "One is not only to regard the devices, but also", said net curtain managers Ron Johnson try out two days before the opening. "This fell right into Roundy's hands.

Question:
If I get a new iPod, will I be able to sync it to my already existing iTunes?
Answer:
de.li.cious add to de.li.cious | digg digg this! | technorati add to technorati | email email this post. Their fans skew younger, and are more comfortable on line; Many of them are quite international, and domestic US sales matter less. "The Federal Reserve reported its monthly G.19 Consumer Credit statistics today for the month of May 2007. As usual no one went on record stating that WM put them out of business. Missus Big Picture has long ago made her preferences known for Lowes over the Depot. January is typically the time of year when people resolve to exercise more, eat more healthily or make more time for themselves. Where can you go from there? Where?. Its too bad I only had the Razr with me, and not the digital camera. I just find it funny when people are so pro union but so against buying products that are expensive becasue they are made in progressive first world countries rather than third world hell holes. The company, which is based in Hingham, Ma., said today that those businesses haven't been doing well, and sales haven't been great at its stores for women either. If Q2 slows even more, that gets spent from then thru Q4, and so on.

Question:
Whats the most pain youve ever been in?
Answer:
Despite being heavily medicated it still sucked so hard to make a wrong move when getting out of the bed, or coughing, or reaching for stuff and picking it up etc. Paul somehow walked away with NO INJURIES whatsoever...I think he might have gotten a few scratches. Weirdest part: I have NEVER, before of since, putted as well as I did the rest of that round. I flew back from Europe several years ago; actually, this involved four connecting flights (first to Rome, then to Amsterdam, then across the pond, etc.), which took a little over 24 hours. But instead of getting the needle in between the bones, where it is supposed to go, the doctor kept missing and jammed the needle time after time into my verterbrae (correct word?). Sounds like I'm not the only one who has experienced some pain on the course. Earlier this year I was in the hospital with severe food poisoning. An adult in a structure so unsteady as to vibrate because a kid hit a wall would probably scream as he dove out the window. The water in question came from a large glass jug with a peck pan on the bottom for the chickens to drink from and a fill fitting on the top to re-fill it with. * If you felt tyred and deflated would you bother to repair the puncture on your child,s bicycle?. But at least it wasn't a kidney stone.

Physics Forum:
Question:
I know that hertz defines the frequency
Answer:
The degree to which an oscillating signal produces the same frequency for a specified interval of time. Dictionary Resources Teacher's Corner Feedback. The change of potential energy experienced by an electron moving from a place where the potential has a value of V to a place where it has a value of (V+1 volt). Alan Smale (Director), within the Astrophysics Science Division (ASD) at NASA's Goddard Space Flight Center. Hertz effect. The hertz (symbol: Hz) is a measure of frequency per unit of time, or the number of cycles per second. Light is electromagnetic radiation that is even higher in frequency, and has frequencies in the range of tens (infrared) to thousands (ultraviolet) of terahertz. Phillips View in context. Because the energy is measured per time and area, flux measurements make it easy for astronomers to compare the relative energy output of objects with very different sizes or ages. In practice, the hertz simply replaced the older cycle per second. galaxy.

Search strings from orginal email:
Question:
hd-dvd blu ray compression techniques
Answer:
Reload this Page LG Black Blu-ray/HD-DVD Reader & Dual Layer DVD+/-RW Writer SATA OEM Drive $126. A: 600,000 players in the US -- 300,000 of which were Xbox 360 HD DVD drives. it's all there for anyone to research for free, but that doesn't mean he's going to do what's best for you. Obviously, the launch of the PlayStation 3, which has a Blu-ray Disc drive would have pushed sales. I think its funny that someone a few posts ago said that the human eye can't tell the difference in the data/bit rate. I urge everyone again... I'm sorry to say this but I have 20/15 vision (meaning that my eyes see better than normal) and I can't tell the diference between a good connection on my cable and a HD movie. I see where your going with this.. As far as any of the next generation DVD formats, time will tell how long their writable media will keep. You can easily extract a certain beautiful section of music or dialog from your DVD video file and save them as MP3 files. Back on topic, and in reply to Rusty, no the war is clearly just getting started.

Question:
Computer Running Sloooooow.
Answer:
I also set an Include filter for paths matching C:\Windows\System32\Dllhost.exe, minimized it, and let my wife have the system back. For example, the mouse moved and when I clicked on the start button the start menu opened after about 30 seconds. I have recently been getting these very very slow startups with almost no hard drive activity, after I log in the hard drive just has no activity for up to a minute, then everything comes up quick like usual. It is really a matter of personal preference, but I use Norton. things that have a non-minimized window, 3. Apprentice's Sorcerer. Perhaps someone more familiar with 'nix internals can fill me in on this. It is still a priviledge to have all this iNFO at our fingertips EVEN IF IT TAKES TIME TO GET IT!. Perhaps you meant the O9 - Extra button: (no name) - {CD67F990-D8E9-11d2-98FE-00C0F0318AFE} - (no file) entry instead?. O4 - Global Startup: Adobe Reader Speed Launch.lnk = C:\Program Files\Adobe\Acrobat 7.0\Reader\reader_sl.exe. Location: Scarborough, Ontario.

Dan A's random question:
Question:
what is the population of chicago
Answer:
The Chicago Fire association football (soccer) club are members of the MLS. Chicago is the world headquarters for United Airlines, the world's second-largest airline by revenue-passenger-kilometers while it's the second largest hub for American Airlines. If legal scholars are right that the fight over Rod Blagojevich's appointment of Roland Burris to replace President-elect Barack Obama in the Senate will be long, possibly ending up before the U.S. * Flag of Israel Petah Tikva (Israel) 1994. The Chicago Slaughter of the CIFL began in 2006 and play at the Sears Centre. For every 100 females age 18 and over, there were 91.1 males. It was during this wave that Chicago became a center for jazz, with King Oliver leading the way.^[13] In 1933, Mayor Anton Cermak was assassinated while in Miami with President Franklin D. Chicago also has a massive Irish American population, with many residing on its South Side. From top left: Chicago Theater, the Sears Tower, the University of Chicago, the skyline from Northerly Island, Navy Pier, the Field Museum, and Crown Fountain in Millenium Park. Total 2,896,016 2,783,726 3,005,072 227.2 12,747 12,252 13,227. showed off the aircraft manufacturer's new corporate home.


Ok, well those samples are pretty good, I'm going to add a few more filter though.


Tue, Jan 6, 2009 at 4:52 PM

ok, a bit of tinkering, I think it's pretty good. You can get it on my site, you'll need lynx if you want to actually run it, but you can always just marvel at the code. It's called oracle

http://daleswanson.org/programs.htm#Perl


Wed, Jan 21, 2009 at 2:15 AM

So I decided to use my oracle program on TOTSE, but I was shocked to discover it had closed down. So I went to the forums ran by one of the main moderators and created an account (creatively named 'monkeys') there. I've made 7 replys in threads on a variety of subjects. So far I got one response.
http://bbs.zoklet.net/search.php?searchid=27720
http://bbs.zoklet.net/showthread.php?p=28167#post28167


Mon, Feb 9, 2009 at 7:44 AM

So I wanted to find a new forum for my program. Keys were a lot of users and activity, general subject, not geared towards smart people, and nothing I actually use. For now the IGN boards seem to be working well.

monkeys83

http://boards.ign.com/mac_general_board/b5146/176579774/r176775110/

http://boards.ign.com/teh_vestibule/b5296/176773322/r176775151/

http://boards.ign.com/teh_vestibule/b5296/176775048/r176775190/ <- Edited to remove some nonesense from the middle. http://boards.ign.com/gear_general_board/b5124/176755860/r176775216/

http://boards.ign.com/teh_vestibule/b5296/176775274/r176775302/

http://boards.ign.com/teh_vestibule/b5296/176775322/r176775367/

http://boards.ign.com/teh_vestibule/b5296/176874248/r176874377/

Sat, Sept 12, 2009

I've updated the program to work again with a redesigned Google layout.

No comments:

Post a Comment