Notebook

1 Process Data
2 Pre-Process Data For Deep Learning
- - - 2.0.0.1 Look at one example of processed issue bodies
    - 2.0.0.2 Look at one example of processed issue titles
3 Define Model Architecture
- - 3.0.1 Load the data from disk into variables
  - 3.0.2 Define Model Architecture
4 Train Model
5 See Results On Holdout Set
6 Feature Extraction Demo
- - 6.0.1 Example 1: Issues Installing Python Packages
  - 6.0.2 Example 2: Issues asking for feature improvements

In [1]:

import pandas as pd
import logging
import glob
from sklearn.model_selection import train_test_split
pd.set_option('display.max_colwidth', 500)
logger = logging.getLogger()
logger.setLevel(logging.WARNING)

Process Data¶

Look at filesystem to see files extracted from BigQuery (or Kaggle: https://www.kaggle.com/davidshinn/github-issues/)

In [9]:

!ls -lah | grep github_issues.csv

-rw-r--r-- 1 40294 40294 2.7G Jan 18  2018 github_issues.csv

Split data into train and test set and preview data

In [11]:

#read in data sample 2M rows (for speed of tutorial)
traindf, testdf = train_test_split(pd.read_csv('github_issues.csv').sample(n=2000000), 
                                   test_size=.10)


#print out stats about shape of data
print(f'Train: {traindf.shape[0]:,} rows {traindf.shape[1]:,} columns')
print(f'Test: {testdf.shape[0]:,} rows {testdf.shape[1]:,} columns')

# preview data
traindf.head(3)

Train: 1,800,000 rows 3 columns
Test: 200,000 rows 3 columns

Out[11]:

	issue_url	issue_title	body
3165423	"https://github.com/1000hz/bootstrap-validator/issues/574"	uncaught typeerror: f b is not a function when using $ ... .validator 'update'	the above error is being thrown when i try and run update via js to include some new fields that have been added dynamically. i'm using backbone.js rendering a script template element to add a new set up fields based on user interaction. the full error message is: uncaught typeerror: f b is not a function at htmlformelement.<anonymous> validator.min.js:9 at function.each jquery.min.js:2 at n.fn.init.each jquery.min.js:2 at n.fn.init.b as validator validator.min.js:9 at n.initskillgroup app.l...
2763145	"https://github.com/quasar-analytics/quasar/issues/2821"	invoke endpoint regression	problem accures in versions: 21.x.x , 23.x.x and 24.x.x didn't check 22.x.x first query is put to view mount sql select from /test-mount/testdb/flatviz the second one sql select row.seriesone as seriesone, row.seriestwo as seriestwo, min row.measureone as measureone from output_of_first_query as row group by row.seriesone, row.seriestwo order by row.seriesone asc, row.seriestwo asc the third one is sql select from output_of_second_query where seriesone = one-one in 20.14.13 this works as exp...
3882729	"https://github.com/msharov/ustl/issues/79"	build ustl with clang on linux	hi, on ubuntu 14.04 clang 3.4, gcc 4.8.4 and fedora 22 clang 3.5, gcc 5.3.1 : cc=clang cxx=clang++ ./configure --libdir=path/to/libsupc++.a without --libdir it searches for libcxxabi when cc=clang make works fine, make check however shows quite a few diffs. is such configuration supposed to work? thanks!

Convert to lists in preparation for modeling

In [9]:

train_body_raw = traindf.body.tolist()
train_title_raw = traindf.issue_title.tolist()
#preview output of first element
train_body_raw[0]

Out[9]:

'some of the sds alerts do not have clearing alerts. so it always present in alerting directory. these kinds of alerts should be stored in etcd under /alerting/notify, it never goes to alerting/alerts directory and it is not displayed under alerts in ui also. these kinds of alerts are notified via notification channel and deleted via ttl. node_agent should have a logic to handle this in alerting framework.'

Pre-Process Data For Deep Learning¶

See this repo for documentation on the ktext package

In [10]:

%reload_ext autoreload
%autoreload 2
from ktext.preprocess import processor

/opt/conda/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.

In [11]:

%%time
# Clean, tokenize, and apply padding / truncating such that each document length = 70
#  also, retain only the top 8,000 words in the vocabulary and set the remaining words
#  to 1 which will become common index for rare words 
body_pp = processor(keep_n=8000, padding_maxlen=70)
train_body_vecs = body_pp.fit_transform(train_body_raw)

WARNING:root:....tokenizing data
WARNING:root:(1/3) done. 1738 sec
WARNING:root:....building corpus
WARNING:root:(2/3) done. 568 sec
WARNING:root:....consolidating corpus
WARNING:root:(3/3) done. 9 sec
WARNING:root:Finished parsing 1,800,000 documents.
WARNING:root:...fit is finished, beginning transform
WARNING:root:done. 733 sec

CPU times: user 22min 17s, sys: 1min 16s, total: 23min 34s
Wall time: 50min 53s

Look at one example of processed issue bodies¶

In [12]:

print('\noriginal string:\n', train_body_raw[0], '\n')
print('after pre-processing:\n', train_body_vecs[0], '\n')

original string:
 some of the sds alerts do not have clearing alerts. so it always present in alerting directory. these kinds of alerts should be stored in etcd under /alerting/notify, it never goes to alerting/alerts directory and it is not displayed under alerts in ui also. these kinds of alerts are notified via notification channel and deleted via ttl. node_agent should have a logic to handle this in alerting framework. 

after pre-processing:
 [37 33 39  1  6 17 29 22 13  6  3 36 25  8 34 23  1 15  3 40 26 33  6 35
 11 38 23 18 45  1  4 32  2 25 28 20 42  1  4  6 15  9 25 24 29 16 45  6
 23 44  7  3 40 26 33  6 10 31 46 30 12  9 14 46 43  3  1 35 22  5]

In [13]:

# Instantiate a text processor for the titles, with some different parameters
#  append_indicators = True appends the tokens '_start_' and '_end_' to each
#                      document
#  padding = 'post' means that zero padding is appended to the end of the 
#             of the document (as opposed to the default which is 'pre')
title_pp = processor(append_indicators=True, keep_n=4500, 
                     padding_maxlen=12, padding ='post')

# process the title data
train_title_vecs = title_pp.fit_transform(train_title_raw)

WARNING:root:....tokenizing data
WARNING:root:(1/3) done. 222 sec
WARNING:root:....building corpus
WARNING:root:(2/3) done. 35 sec
WARNING:root:....consolidating corpus
WARNING:root:(3/3) done. 2 sec
WARNING:root:Finished parsing 1,800,000 documents.
WARNING:root:...fit is finished, beginning transform
WARNING:root:done. 101 sec

Look at one example of processed issue titles¶

In [14]:

print('\noriginal string:\n', train_title_raw[0])
print('after pre-processing:\n', train_title_vecs[0])

original string:
 node_agent should handle sds native alerts also
after pre-processing:
 [3 1 8 6 1 7 4 5 2 0 0 0]

Serialize all of this to disk for later use

In [15]:

import dill as dpickle
import numpy as np

# Save the preprocessor
with open('body_pp.dpkl', 'wb') as f:
    dpickle.dump(body_pp, f)

with open('title_pp.dpkl', 'wb') as f:
    dpickle.dump(title_pp, f)

# Save the processed data
np.save('train_title_vecs.npy', train_title_vecs)
np.save('train_body_vecs.npy', train_body_vecs)

Define Model Architecture¶

Load the data from disk into variables¶

In [16]:

from seq2seq_utils import load_decoder_inputs, load_encoder_inputs, load_text_processor

In [17]:

encoder_input_data, doc_length = load_encoder_inputs('train_body_vecs.npy')
decoder_input_data, decoder_target_data = load_decoder_inputs('train_title_vecs.npy')

Shape of encoder input: (1800000, 70)
Shape of decoder input: (1800000, 11)
Shape of decoder target: (1800000, 11)

In [18]:

num_encoder_tokens, body_pp = load_text_processor('body_pp.dpkl')
num_decoder_tokens, title_pp = load_text_processor('title_pp.dpkl')

Size of vocabulary for body_pp.dpkl: 8,002
Size of vocabulary for title_pp.dpkl: 4,502

Define Model Architecture¶

In [19]:

%matplotlib inline
from keras.models import Model
from keras.layers import Input, LSTM, GRU, Dense, Embedding, Bidirectional, BatchNormalization
from keras import optimizers

In [20]:

#arbitrarly set latent dimension for embedding and hidden units
latent_dim = 300

##### Define Model Architecture ######

########################
#### Encoder Model ####
encoder_inputs = Input(shape=(doc_length,), name='Encoder-Input')

# Word embeding for encoder (ex: Issue Body)
x = Embedding(num_encoder_tokens, latent_dim, name='Body-Word-Embedding', mask_zero=False)(encoder_inputs)
x = BatchNormalization(name='Encoder-Batchnorm-1')(x)

# Intermediate GRU layer (optional)
#x = GRU(latent_dim, name='Encoder-Intermediate-GRU', return_sequences=True)(x)
#x = BatchNormalization(name='Encoder-Batchnorm-2')(x)

# We do not need the `encoder_output` just the hidden state.
_, state_h = GRU(latent_dim, return_state=True, name='Encoder-Last-GRU')(x)

# Encapsulate the encoder as a separate entity so we can just 
#  encode without decoding if we want to.
encoder_model = Model(inputs=encoder_inputs, outputs=state_h, name='Encoder-Model')

seq2seq_encoder_out = encoder_model(encoder_inputs)

########################
#### Decoder Model ####
decoder_inputs = Input(shape=(None,), name='Decoder-Input')  # for teacher forcing

# Word Embedding For Decoder (ex: Issue Titles)
dec_emb = Embedding(num_decoder_tokens, latent_dim, name='Decoder-Word-Embedding', mask_zero=False)(decoder_inputs)
dec_bn = BatchNormalization(name='Decoder-Batchnorm-1')(dec_emb)

# Set up the decoder, using `decoder_state_input` as initial state.
decoder_gru = GRU(latent_dim, return_state=True, return_sequences=True, name='Decoder-GRU')
decoder_gru_output, _ = decoder_gru(dec_bn, initial_state=seq2seq_encoder_out)
x = BatchNormalization(name='Decoder-Batchnorm-2')(decoder_gru_output)

# Dense layer for prediction
decoder_dense = Dense(num_decoder_tokens, activation='softmax', name='Final-Output-Dense')
decoder_outputs = decoder_dense(x)

########################
#### Seq2Seq Model ####

#seq2seq_decoder_out = decoder_model([decoder_inputs, seq2seq_encoder_out])
seq2seq_Model = Model([encoder_inputs, decoder_inputs], decoder_outputs)


seq2seq_Model.compile(optimizer=optimizers.Nadam(lr=0.001), loss='sparse_categorical_crossentropy')

** Examine Model Architecture Summary **

In [21]:

from seq2seq_utils import viz_model_architecture
seq2seq_Model.summary()
viz_model_architecture(seq2seq_Model)

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
Decoder-Input (InputLayer)      (None, None)         0                                            
__________________________________________________________________________________________________
Decoder-Word-Embedding (Embeddi (None, None, 300)    1350600     Decoder-Input[0][0]              
__________________________________________________________________________________________________
Encoder-Input (InputLayer)      (None, 70)           0                                            
__________________________________________________________________________________________________
Decoder-Batchnorm-1 (BatchNorma (None, None, 300)    1200        Decoder-Word-Embedding[0][0]     
__________________________________________________________________________________________________
Encoder-Model (Model)           (None, 300)          2942700     Encoder-Input[0][0]              
__________________________________________________________________________________________________
Decoder-GRU (GRU)               [(None, None, 300),  540900      Decoder-Batchnorm-1[0][0]        
                                                                 Encoder-Model[1][0]              
__________________________________________________________________________________________________
Decoder-Batchnorm-2 (BatchNorma (None, None, 300)    1200        Decoder-GRU[0][0]                
__________________________________________________________________________________________________
Final-Output-Dense (Dense)      (None, None, 4502)   1355102     Decoder-Batchnorm-2[0][0]        
==================================================================================================
Total params: 6,191,702
Trainable params: 6,189,902
Non-trainable params: 1,800
__________________________________________________________________________________________________

Train Model¶

In [22]:

from keras.callbacks import CSVLogger, ModelCheckpoint

script_name_base = 'tutorial_seq2seq'
csv_logger = CSVLogger('{:}.log'.format(script_name_base))
model_checkpoint = ModelCheckpoint('{:}.epoch{{epoch:02d}}-val{{val_loss:.5f}}.hdf5'.format(script_name_base),
                                   save_best_only=True)

batch_size = 1200
epochs = 7
history = seq2seq_Model.fit([encoder_input_data, decoder_input_data], np.expand_dims(decoder_target_data, -1),
          batch_size=batch_size,
          epochs=epochs,
          validation_split=0.12, callbacks=[csv_logger, model_checkpoint])

Train on 1584000 samples, validate on 216000 samples
Epoch 1/7
1584000/1584000 [==============================] - 265s 167us/step - loss: 2.7234 - val_loss: 2.4321

/ds/.local/lib/python3.6/site-packages/keras/engine/topology.py:2344: UserWarning: Layer Decoder-GRU was passed non-serializable keyword arguments: {'initial_state': [<tf.Tensor 'Encoder-Model/Encoder-Last-GRU/while/Exit_2:0' shape=(?, 300) dtype=float32>]}. They will not be included in the serialized model (and thus will be missing at deserialization time).
  str(node.arguments) + '. They will not be included '

Epoch 2/7
1584000/1584000 [==============================] - 263s 166us/step - loss: 2.3446 - val_loss: 2.3563
Epoch 3/7
1584000/1584000 [==============================] - 263s 166us/step - loss: 2.2608 - val_loss: 2.3281
Epoch 4/7
1584000/1584000 [==============================] - 263s 166us/step - loss: 2.2117 - val_loss: 2.3161
Epoch 5/7
1584000/1584000 [==============================] - 263s 166us/step - loss: 2.1767 - val_loss: 2.3110
Epoch 6/7
1584000/1584000 [==============================] - 263s 166us/step - loss: 2.1494 - val_loss: 2.3095
Epoch 7/7
1584000/1584000 [==============================] - 265s 167us/step - loss: 2.1268 - val_loss: 2.3124

In [23]:

#save model
seq2seq_Model.save('seq2seq_model_tutorial.h5')

/ds/.local/lib/python3.6/site-packages/keras/engine/topology.py:2344: UserWarning: Layer Decoder-GRU was passed non-serializable keyword arguments: {'initial_state': [<tf.Tensor 'Encoder-Model/Encoder-Last-GRU/while/Exit_2:0' shape=(?, 300) dtype=float32>]}. They will not be included in the serialized model (and thus will be missing at deserialization time).
  str(node.arguments) + '. They will not be included '

See Results On Holdout Set¶

In [27]:

from seq2seq_utils import Seq2Seq_Inference
seq2seq_inf = Seq2Seq_Inference(encoder_preprocessor=body_pp,
                                 decoder_preprocessor=title_pp,
                                 seq2seq_model=seq2seq_Model)

In [34]:

# this method displays the predictions on random rows of the holdout set
seq2seq_inf.demo_model_predictions(n=50, issue_df=testdf)


==============================================
============== Example # 137237 =================

"https://github.com/envisionnw/upland/issues/90"
Issue Body:
 <a href= https://github.com/ncpn ><img src= https://avatars3.githubusercontent.com/u/9699622?v=3 align= left width= 96 height= 96 hspace= 10 ></img></a> issue by ncpn https://github.com/ncpn _friday mar 17, 2017 at 19:31 gmt_
_originally opened as https://github.com/ncpn/upland/issues/90_ ---- check for odd species after completing the plot. compare with list of prior year's species. paper list 

Original Title:
 end of plot issues - identify odd species

****** Machine Generated Title (Prediction) ******:
 closed species plot not working


==============================================
============== Example # 132413 =================

"https://github.com/open-organization-ambassadors/open-org-it-culture/issues/38"
Issue Body:
 need to include a specific call to the opensource.com writers list during the announcement part of the book series process. 

Original Title:
 update announcement process

****** Machine Generated Title (Prediction) ******:
 add a list of the book series to the book


==============================================
============== Example # 110893 =================

"https://github.com/arquillian/arquillian-cube/issues/795"
Issue Body:
 issue overview add a new property to disable detection of image stream files those ended with -is.yml from target directory. expected behaviour by default cube should not process image stream files if user does not set it. current behaviour cube always try to execute -is.yml files which can cause some problems in most of cases, for example if you are using kuberentes instead of openshift or if you use together fabric8 maven plugin with cube. 

Original Title:
 add a new property to disable detection of image stream files

****** Machine Generated Title (Prediction) ******:
 add a way to disable image detection


==============================================
============== Example # 179062 =================

"https://github.com/TryGhost/Ghost/issues/9299"
Issue Body:
 in ghost 1.0 we set out to get rid of incremental ids. we didn't quite achieve it, as the migrations table still uses it, and i believe there is still some hardcoded expectations around the ghost owner id. regarding incremental ids in the migrations table i raised an issue on knex migrator: https://github.com/tryghost/knex-migrator/issues/91 we need to also try to get rid of reliance on ids inside of ghost itself. this issue needs more detail really - raising it as a starting point. 

Original Title:
 remove all reliance on incremental ids

****** Machine Generated Title (Prediction) ******:
 incremental migration to oracle db


==============================================
============== Example # 54381 =================

"https://github.com/googlevr/gvr-unity-sdk/issues/509"
Issue Body:
 hi, i'm trying to get the deep link working. i can send the activity, open the app and read dashcode and get booleanextra and all that. so activating the deep link works fine and for example when i call getaction, it returns android.intent.action.view which is correct. the main problem is that getdatastring and getscheme always return null. i'm out of test ideas. do you think its a bug? i have attached the manifest file for your reference. and i'm using gvrintent.getdata that always returns null. islaunchedfromvr and getintenthashcode are working fine. and this is the command line i used to test as an example: ./adb shell am start -w -a android.intent.action.view -d shapevisual://com.shapevisual.app?wl=gfs com.shapevisual.app androidmanifest.xml.txt https://github.com/googlevr/gvr-unity-sdk/files/864522/androidmanifest.xml.txt 

Original Title:
 android - deep link - getdatastring always returns null

****** Machine Generated Title (Prediction) ******:
 deep link and null return


==============================================
============== Example # 113341 =================

"https://github.com/sten626/mirror-match/issues/26"
Issue Body:
 right now there is no logging of any kind. read up on proper app logging in angular and add it to the app. 

Original Title:
 add logging to app

****** Machine Generated Title (Prediction) ******:
 add logging to app


==============================================
============== Example # 57566 =================

"https://github.com/convox/praxis/issues/319"
Issue Body:
 a pro user has expressed a need for this. 

Original Title:
 support for ev green bar ssl certs

****** Machine Generated Title (Prediction) ******:
 add a new user


==============================================
============== Example # 199162 =================

"https://github.com/ChurchCRM/CRM/issues/2403"
Issue Body:
 im presently upgrading to 2.7.2 from 2.7.1, my automated update is not working from my site, so im just uploading the new code and replacing the .htaccess and config.php . is there any other changes that i should be aware of to make sure that the update is successfull ? 

Original Title:
 upgrading to 2.7.2

****** Machine Generated Title (Prediction) ******:
 question : how to update the code ?


==============================================
============== Example # 187512 =================

"https://github.com/keepassxreboot/keepassxc/issues/693"
Issue Body:
 i tried to enable and use yubikey on snap version but yubikey doesn't shows on the list of valid devices. i done a comparison with debian package same version 2.2.0 and works fine. so i think the problem could be related to snap access to usb devices. 

Original Title:
 yubikey doesn't works on snap version

****** Machine Generated Title (Prediction) ******:
 snap version does n't work on ubuntu * number *


==============================================
============== Example # 18015 =================

"https://github.com/primefaces/primeng/issues/4456"
Issue Body:
 hi folks, <h3 class= first >advanced</h3> <p-fileupload name= demo url= ./upload.php onupload = onuploadhandler $event multiple= multiple accept= image/ maxfilesize= 1000000 > <ng-template ptemplate type= content > <ul ngif= uploadedfiles.length > <li ngfor= let file of uploadedfiles >{{file.name}} - {{file.size}} bytes</li> </ul> </ng-template> </p-fileupload> onuploadhandler event:any { alert 'test' ; for let file of event.files { this.uploadedfiles.push file ; } } onuploadhandler function is not working. what's wrong? i am using primeng - ^5.0.0-rc.0. any idea? 

Original Title:
 onupload function is not working!

****** Machine Generated Title (Prediction) ******:
 how to use this with multiple angular - cli * number *


==============================================
============== Example # 153048 =================

"https://github.com/imabug/raddb/issues/212"
Issue Body:
 adding a new test type is broken. this error is produced when the submit button is pressed. httpexception in handler.php line 133: this action is unauthorized. 

Original Title:
 error adding new test type

****** Machine Generated Title (Prediction) ******:
 new test issue


==============================================
============== Example # 28004 =================

"https://github.com/Carthage/Carthage/issues/1936"
Issue Body:
 one of the more confusing parts around how carthage installation works currently is that it requires a dylib itself with many more embedded dylibs to be installed alongside the carthage binary carthagekit.framework . if we have support for 1379, would it be possible to statically link carthage's dependencies into the primary carthage binary? if so, it could mean that there would be only one file to install. the primary unanswered question in my mind is whether it's possible to statically link against the swift core dylibs. does anyone know if there's an exposed way to do this? they seem to be present at this path /applications/xcode.app/contents/developer/toolchains/xcodedefault.xctoolchain/usr/lib/swift_static/macosx , so perhaps it is as simple as linking against these .a files. it would be a little strange for carthage to use the non-default flow for embedding built frameworks, but it is a cli rather than an app so perhaps this may be a good choice for this scenario. additionally, this would likely resolve issues where another version of carthagekit.framework steaks into a user's @rpath before the one that they just downloaded, causing a new version of the carthage binary to use the wrong version of carthagekit.framework . thoughts? thanks for reading. 

Original Title:
 statically link carthage frameworks into carthage?

****** Machine Generated Title (Prediction) ******:
 add carthage support for carthage


==============================================
============== Example # 131367 =================

"https://github.com/qlicker/qlicker/issues/341"
Issue Body:
 when looking at the course details as a professor, the message add ta to ... shows up when clicking the add student button. 

Original Title:
 add student to course displays the wrong message

****** Machine Generated Title (Prediction) ******:
 add ta to ta course


==============================================
============== Example # 82152 =================

"https://github.com/RetroWoW/RetroWoW/issues/96"
Issue Body:
 description : hunter pets are not summoned upon res in battleground current behaviour : hunter pets are not summoned upon res in battleground expected behaviour : spirit guides in battlegrounds should summon/resurrect your current pet when the hunter is resurrected. steps to reproduce the problem : 1. die in a bg 2. get resurected 3. source: http://wowwiki.wikia.com/wiki/patch_1.5.0 

Original Title:
 hunter pets are not summoned upon res in battleground

****** Machine Generated Title (Prediction) ******:
 * number*.2 * number*.5 not possible to be used in a new


==============================================
============== Example # 160809 =================

"https://github.com/kubernetes/ingress-nginx/issues/1825"
Issue Body:
 ie 11 does not support permanent redirect 308 with default headers, so it might not be the best default. it was introduced in this pull request: https://github.com/kubernetes/ingress-nginx/pull/1776 you could also support a fall back mode based on user agent: https://stackoverflow.com/questions/37701100/redirecting-ie-7-and-ie-11-by-useragent-nginx-config it might be possible to get ie 11 to support permanent redirect 308 if the redirect page presented does not trigger compatibility mode, but older versions of ie still won't support 308. 

Original Title:
 permanent redirect 308 not supported in ie11

****** Machine Generated Title (Prediction) ******:
 redirect to * number * redirect does not work


==============================================
============== Example # 197532 =================

"https://github.com/ngrx/platform/issues/49"
Issue Body:
 export const selectfeature = createfeatureselector<featurestate> 'feature' ; ~~~~~~~~~~~~~~~ error ts4023: exported variable 'selectfeature' has or is using name 'memoizedselector' from external module .../ngrx/modules/store/src/selector but cannot be named. 

Original Title:
 memoizedselector needs to be exported as well

****** Machine Generated Title (Prediction) ******:
 export ' ' : ' can not be used in ' module


==============================================
============== Example # 163719 =================

"https://github.com/aspnet/StaticFiles/issues/211"
Issue Body:
 staticfiles/src/microsoft.aspnet.staticfiles/fileextensioncontenttypeprovider.cs is missing the outlook .msg mimetype - currently manually doing the following: var provider = new fileextensioncontenttypeprovider ; provider.mappings.add .msg , application/vnd.ms-outlook ; ... but i think it would be good to have it included directly in the code. 

Original Title:
 missing .msg mimetype mapping

****** Machine Generated Title (Prediction) ******:
 missing outlook / auto - parsing of the source - code


==============================================
============== Example # 169328 =================

"https://github.com/epics-modules/autosave/issues/13"
Issue Body:
 tech talk message as follows: > hello, > > > here at slac, we saw that autosave is failing to recover the data for a waveform with 1 element. for testing purposes, we changed manually nelm to 2 and the recovery succeeded. another test was to manually edit the sav file, adding the keyword @array@ and the recovering succeeded, too. > > > i saw the following comment in 5.4.1 release: previously, restoring an array which had been saved with zero or one values failed. also, manual restore including restore by configmenu of any array pv caused a seg fault. . > > > as we are using 5.7.1, i think this problem is already corrected since 5.4.1. the behavior was observed when using epics 3.15. > > > the strange thing is that the same version of autosave seems to be working in epics 3.14, but not in 3.15. > > > i saw that autosave uses ca_element_count from the channel access api. maybe something changed in this function in epics 3.15? > > > thank you for your help. > > > márcio paduan donadio > > system control engineer - slac > 

Original Title:
 recovering data from waveform with 1 element

****** Machine Generated Title (Prediction) ******:
 recover the data from the * number *


==============================================
============== Example # 85076 =================

"https://github.com/kristoferjoseph/flexboxgrid/issues/233"
Issue Body:
 hello! when i using auto width: <div class= row center-xs center-sm center-md center-lg > <div class= col-xs col-sm col-md col-lg > <div class= box top bottom id= white >1</div> </div> <div class= col-xs col-sm col-md col-lg > <div class= box top bottom id= white >2</div> </div> </div> in chrome, firefox, vivaldi and android devices, all ok - content is transferred as filling: ! screenshot at 15 15-23-31 https://cloud.githubusercontent.com/assets/13396947/22974363/ff5ee854-f392-11e6-91ef-01844d8f655d.png but in safari om macos , displayed content in one row and add horizontal scroll: ! screenshot at 15 15-27-25 https://cloud.githubusercontent.com/assets/13396947/22974432/4ff8d496-f393-11e6-8abe-c04a6029d9ef.png how can i fix it? 

Original Title:
 content filling on safari

****** Machine Generated Title (Prediction) ******:
 table not working


==============================================
============== Example # 13218 =================

"https://github.com/koorellasuresh/UKRegionTest/issues/82803"
Issue Body:
 first from flow in uk south 

Original Title:
 first from flow in uk south

****** Machine Generated Title (Prediction) ******:
 first from flow in uk south


==============================================
============== Example # 193511 =================

"https://github.com/highcharts/highcharts/issues/7347"
Issue Body:
 i'm using highstockcharts and recently upgraded to v6.0.3. since then, the tooltips won't be shown anymore as soon as the tooltip is higher than the actual chart. see the minimum example which i've provided. expected behaviour the tooltip should be shown. actual behaviour the tooltip is not shown if the tooltip the height is larger than the actual chart. live demo with steps to reproduce http://jsfiddle.net/n1h3q3sr/ uncomment the part teststring += <br/> not working anymore to make the tooltip visible. affected browser s chrome / firefox and most probably ie too 

Original Title:
 tooltip is not shown anymore if tooltip is larger than the chart

****** Machine Generated Title (Prediction) ******:
 tooltip not shown on * number *


==============================================
============== Example # 7320 =================

"https://github.com/Criccle/GoogleCombo/issues/1"
Issue Body:
 unlike google chart for mendix, google combo chart for mendix cannot redraw a chart. only one chart can be drawn only once but no redraw or two charts in a page is possible. thus, this module is useless at all with this condition. 

Original Title:
 cannot redraw a chart by google combo chart for mendix

****** Machine Generated Title (Prediction) ******:
 google charts not working


==============================================
============== Example # 42159 =================

"https://github.com/cviebrock/eloquent-sluggable/issues/337"
Issue Body:
 hello! i have a model with multiple slug fields setup like this: return 'slug_en' => 'source' => 'name_en' , 'slug_es' => 'source' => 'name_es' , 'slug_fr' => 'source' => 'name_fr' , 'slug_it' => 'source' => 'name_it' , 'slug_de' => 'source' => 'name_de' , ; i want to findbyslug on all of them, i have tried with slugkeyname but no luck. is there something im missing? thank you 

Original Title:
 find on multiple slug fields

****** Machine Generated Title (Prediction) ******:
 multiple fields with same name


==============================================
============== Example # 184774 =================

"https://github.com/hylang/hy/issues/1271"
Issue Body:
 it was released in 2008, so it's almost 10 years old. also, we don't test it. 

Original Title:
 drop support for python 2.6

****** Machine Generated Title (Prediction) ******:
 remove old version from * number *


==============================================
============== Example # 121668 =================

"https://github.com/MajkiIT/polish-ads-filter/issues/3646"
Issue Body:
 @majkiit w prebake jest reguła, która psuje logowanie na gg. a najwyraźniej są jeszcze osoby, które korzystają z gg i z listy prebake. więc nie wiem czy warto dać whitelist na nasz filtr czy nie, co o tym sądzisz? https://github.com/azet12/popupblocker/issues/68 issuecomment-329763381 

Original Title:
 gg.pl prebake

****** Machine Generated Title (Prediction) ******:
 problem z login


==============================================
============== Example # 34871 =================

"https://github.com/WorldDominationArmy/geodk-reqtest-req/issues/1"
Issue Body:
 afsnit: 3. krav til løsningens overordnede egenskaber relateret: 

Original Title:
 krav 1-eksterne kilder til datasupplering

****** Machine Generated Title (Prediction) ******:
 * number * - * number * - * number * -


==============================================
============== Example # 7978 =================

"https://github.com/blockstack/blockstack-portal/issues/416"
Issue Body:
 i noticed that gmp is installed by the macos installer script. noticed that the library was not loaded https://github.com/blockstack/blockstack-portal/issues/415 issuecomment-294392702 for albert: library not loaded: /usr/local/opt/gmp/lib/libgmp.10.dylib referenced from: /private/tmp/blockstack-venv/lib/python2.7/site-packages/fastecdsa/curvemath.so reason: image not found he is on macos 10.12. let's see if we can reproduce this error locally. 

Original Title:
 testing gmp and libffi installation via script

****** Machine Generated Title (Prediction) ******:
 library not loaded in macos


==============================================
============== Example # 28099 =================

"https://github.com/EcrituresNumeriques/transformation_jats_erudit/issues/2"
Issue Body:
 avons-nous une liste définitive des attributs possible de 'fig-type' pour l'extrant de jats? le balisage de mon côté, pour érudit, dépend de la valeur sémantique de l'attribut de cette balise et je voudrais pouvoir styler les différents cas de figures haha , qui sont : <figure>, <tableau>, <encadre>, <objetmedia>, pour les images et le son. merci. 

Original Title:
 attributs possibles pour <fig> sous jats

****** Machine Generated Title (Prediction) ******:
 * number * : gestion des dates


==============================================
============== Example # 24459 =================

"https://github.com/go-gitea/gitea/issues/656"
Issue Body:
 when adding a new member to an organisation owner team, addteammember does not set watches for the new team member. together with 653 that is pretty confusing behaviour and probably a bug. 

Original Title:
 new owner team member does not get watches for org repo's

****** Machine Generated Title (Prediction) ******:
 new member does not set the team member


==============================================
============== Example # 64152 =================

"https://github.com/linuxboss182/SoftEng-2017/issues/84"
Issue Body:
 need 3-4 people to present our application to the class on wednesday. applicants must: - not have presented last week - understand how to use the application - be ready to kick ass remember, you have to present at either this wednesday or the next one, so plan accordingly! 

Original Title:
 iteration 2 presentation

****** Machine Generated Title (Prediction) ******:
 add a new class to the application


==============================================
============== Example # 69032 =================

"https://github.com/kartoza/qgis.org.za/issues/184"
Issue Body:
 i created a form 'contact' and it seems to work but the form labels do not appear on the form so it is a bit useless. please get the labels to appear and merge and release with other improvements asap 

Original Title:
 form labels not appearing

****** Machine Generated Title (Prediction) ******:
 form labels not showing up


==============================================
============== Example # 132252 =================

"https://github.com/NTU-ASH/tree-generator/issues/18"
Issue Body:
 sort a series of node values within the tree, e.g. -take values from 0-9 up to 15 -sort them into a tree with the middle value as the root and the lowest on the left/highest on the right -perhaps do the same for letters so a is to the left and z is to the right 

Original Title:
 binary search tree generation

****** Machine Generated Title (Prediction) ******:
 sort tree nodes


==============================================
============== Example # 53765 =================

"https://github.com/multiformats/multihash/issues/74"
Issue Body:
 why not use the existing crypt format? $.$ 

Original Title:
 why not use the existing crypt format? $.$

****** Machine Generated Title (Prediction) ******:
 why not use the existing format ?


==============================================
============== Example # 123370 =================

"https://github.com/PSEBergclubBern/BergclubBern/issues/181"
Issue Body:
 ich kann bilder einfügen: ! 2017-05-07 14_01_33-tourenbericht anpassen bergclub bern wordpress https://cloud.githubusercontent.com/assets/18282099/25780754/d2260da6-332d-11e7-8350-f46821b300d5.png aber auf der website werden diese nicht angezeigt: ! 2017-05-07 14_00_32-bergclub bern https://cloud.githubusercontent.com/assets/18282099/25780756/defc015c-332d-11e7-982e-e51b758c8179.png 

Original Title:
 bilder eines tourenberichts werden nicht angezeigt

****** Machine Generated Title (Prediction) ******:
 website : update to * url *


==============================================
============== Example # 57636 =================

"https://github.com/postmanlabs/postman-app-support/issues/2996"
Issue Body:
 welcome to the postman issue tracker. any feature requests / bug reports can be posted here. any security-related bugs should be reported directly to security@getpostman.com version/app information: 1. postman version: 4.10.7 2. app chrome app or mac app : linux app not sure if its also happening on other oss 3. os details: ubuntu 14.06 4. is the interceptor on and enabled in the app: no 5. did you encounter this recently, or has this bug always been there: 6. expected behaviour: explain below steps to repoduce 7. console logs http://blog.getpostman.com/2014/01/27/enabling-chrome-developer-tools-inside-postman/ for the chrome app, view->toggle dev tools for the mac app : 8. screenshots if applicable steps to reproduce the problem: it seems postman ignores the failures if there is 1<= passed test after the failed assertion. i.e: assertion a a=true assertion b=false must fails the test assertion c c=true the final outcome of the postman test must be false because b failed. but postman shows the final results as passed because it looks at c which was true as the last line of the test which is wrong and the test easily ignores any bug and marks the test as successfull. some guidelines: 1. please file newman-related issues at https://github.com/postmanlabs/newman/issues 2. if it’s a cloud-related issue, or you want to include personal information like your username / collection names, mail us at help@getpostman.com 3. if it’s a question anything along the lines of “how do i … in postman” , the answer might lie in our documentation - http://getpostman.com/docs. 

Original Title:
 postman is skiping the failed assestions if the last assersion passes

****** Machine Generated Title (Prediction) ******:
 feature request : add support for multiple devices


==============================================
============== Example # 120461 =================

"https://github.com/libgraviton/gdk-java/issues/23"
Issue Body:
 with 12 rql support was introduced for string and date fields. since the rql syntax varies depending on the field type, integer and float and boolean are currently not supported, since they get treated as regular string fields. lets have a look at a typical query against a string field _fieldname_ with the value _value_ ?eq fieldname,string:value in this case the string: prefix is not required. it has the same result as ?eq fieldname,value but lets look at another example again a string field ?eq fieldname,string:20 at this point the string: prefix is required, since the graviton rql parser needs to know it's dealing with a string. omitting string: would lead to an empty result on the other hand, if we look at an integer field ?eq integerfieldname,string:20 would lead to an empty result. in this case the query needs to look like ?eq integerfieldname,string:20 the part that needs changing is https://github.com/libgraviton/gdk-java/blob/develop/gdk-core/src/main/java/com/github/libgraviton/gdk/api/query/rql/rql.java l141 where currently every field is always treated as string. 

Original Title:
 integer, float and boolean support for rql

****** Machine Generated Title (Prediction) ******:
 support for numeric type


==============================================
============== Example # 3333 =================

"https://github.com/jpvillaisaza/hangman/issues/15"
Issue Body:
 losing a game and then restarting shouldn't count as two more games. just one, thanks. 

Original Title:
 fix total number of games

****** Machine Generated Title (Prediction) ******:
 game crashes when game is running


==============================================
============== Example # 133450 =================

"https://github.com/vector-im/riot-meta/issues/28"
Issue Body:
 placeholder overarching issue to track progress on: general ux polish should probably be decomposed further. 

Original Title:
 general ux polish

****** Machine Generated Title (Prediction) ******:
 add more info to the ui


==============================================
============== Example # 111482 =================

"https://github.com/Viva-con-Agua/drops/issues/21"
Issue Body:
 currently, the view for defining the roles is very confusing. a search field for searching users has to be implemented and the role selection should be a little bit more user friendly. 

Original Title:
 roles definition view

****** Machine Generated Title (Prediction) ******:
 improve search for user roles


==============================================
============== Example # 154925 =================

"https://github.com/srusskih/SublimeJEDI/issues/228"
Issue Body:
 i want edit my project config file. according to the readme , by default project config name is <project name>.sublime-project , so the project is the folder that holds the project py file? 

Original Title:
 how to define a project ?

****** Machine Generated Title (Prediction) ******:
 how to edit project name ?


==============================================
============== Example # 18851 =================

"https://github.com/climategadgets/servomaster/issues/7"
Issue Body:
 adafruit dc & stepper motor hat for raspberry pi - mini kit https://www.adafruit.com/product/2348 provides a very reproducible and standard stepper controller solution for raspberry pi, it would be a shame not to support it. this enhancement is much more complicated than 6, though. steppers, unlike servos, do not have inherent limits, and if a stepper is used as a servo, there will have to be solutions put in place to allow limit detection limit switches and torque sensors, to name a couple . in addition, stepper positioning model discrete steps is different from servo positioning model floating point 0 to 1 with adjustable ranges and limits , so some extra work will need to be done. 

Original Title:
 implement tb6612 driver for raspberry pi

****** Machine Generated Title (Prediction) ******:
 rpi motor support


==============================================
============== Example # 174664 =================

"https://github.com/cawilliamson/ansible-gpdpocket/issues/98"
Issue Body:
 first off, thanks for all the effort going into this, very promising. issue: trying to bootstrap an ubuntu-16.04.3 iso from within an existing ubuntu instance. running into an error, which appears to be when ansible starts getting involved. very possible i'm doing something wrong. e: can not write log is /dev/pts mounted? - posix_openpt 2: no such file or directory + grep -wq -- --nogit + echo 'skip pulling source from git' + cd /usr/src/ansible-gpdpocket + ansible_nocows=1 + ansible-playbook system.yml -e bootstrap=true -v warning : provided hosts list is empty, only localhost is available error! syntax error while loading yaml. the error appears to have been in '/usr/src/ansible-gpdpocket/roles/audio/tasks/main.yml': line 17, column 1, but may be elsewhere in the file depending on the exact syntax problem. the offending line appears to be: - name: create chtrt5645 directory ^ here play recap localhost : ok=23 changed=14 unreachable=0 failed=1 

Original Title:
 syntax error while loading yaml

****** Machine Generated Title (Prediction) ******:
 bootstrap fails to mount in ubuntu


==============================================
============== Example # 186883 =================

"https://github.com/prettydiff/prettydiff/issues/456"
Issue Body:
 right now a single language file handles all tasks for a given group of languages. these files need to be broken down into respective pieces: parser beautifier minifier analyzer this is a large architectural effort. fortunately the code is well segmented internally for separation of concerns, so the logic can be broken apart without impact to operational integrity. the challenge is largely administration to ensure all the pieces are included into each of the respective environments and pass data among each other appropriately. 

Original Title:
 separate language files into their respective tasks

****** Machine Generated Title (Prediction) ******:
 fix language handling for all languages


==============================================
============== Example # 151593 =================

"https://github.com/koorellasuresh/UKRegionTest/issues/21568"
Issue Body:
 first from flow in uk south 

Original Title:
 first from flow in uk south

****** Machine Generated Title (Prediction) ******:
 first from flow in uk south


==============================================
============== Example # 24718 =================

"https://github.com/sensorario/go-tris/issues/34"
Issue Body:
 move 1 simone : 5 move 2 computer : 2 move 3 simone : 9 move 4 computer : 1 move 5 simone : 3 move 6 computer : 6 move 7 simone : 8 move 8 computer : 7 move 9 simone : 4 

Original Title:
 in this case computer loose

****** Machine Generated Title (Prediction) ******:
 move to * number *


==============================================
============== Example # 2005 =================

"https://github.com/fossasia/susi_firefoxbot/issues/6"
Issue Body:
 actual behaviour only text response from the server is shown expected behaviour support different types of responses like images, links, tables etc. would you like to work on the issue ? yes 

Original Title:
 support for different types of responses from server

****** Machine Generated Title (Prediction) ******:
 support for different types of response


==============================================
============== Example # 144769 =================

"https://github.com/reallyenglish/ansible-role-poudriere/issues/8"
Issue Body:
 the role clones a remote git repository, which takes time to clone. to make the test faster, create a small, but functional repository in the role, and use it for the test. 

Original Title:
 create minimal ports tree for the test

****** Machine Generated Title (Prediction) ******:
 add a test to the repo


==============================================
============== Example # 148842 =================

"https://github.com/felquis/HTJSON/issues/2"
Issue Body:
 firstly - thanks for making this, i had the same idea. but i would do it slightly differently. exactly 2 differerences. 1. i'd make content an array 2. i'd more the objects inside attr down a level and get rid of it. thus content would be an attribute. for example, instead of : var template = { a : { attr : { href : http://your-domain.com/images/any-image.jpg }, content: { link name } } }; it'd be: var template = a : { href : http://your-domain.com/images/any-image.jpg , content : some text , { img : { src: http://whatever.jpg }, some more text } ; 1. makes it more compact, without losing any document structure information 2. makes it more versatile, and, in fact, makes it complete - it can then encode any html document. 

Original Title:
 shouldn't content be an array? is attr really neccessary?

****** Machine Generated Title (Prediction) ******:
 content - type : attribute


==============================================
============== Example # 83915 =================

"https://github.com/rrdelaney/ava-rethinkdb/issues/3"
Issue Body:
 when i run the ava-rethinkdb it works but when i ran it through travis ci i get error: spawn rethinkdb enoent is there something i am doing wrong or need to add for ci build? 

Original Title:
 error: spawn rethinkdb enoent

****** Machine Generated Title (Prediction) ******:
 spawn enoent on ci


==============================================
============== Example # 22941 =================

"https://github.com/cartalyst/stripe/issues/90"
Issue Body:
 i am using your latest release 2.0.9 but that release does not include the payout file. kidnly upload the latest release that has the payout work. 

Original Title:
 payout file is missing in latest release.

****** Machine Generated Title (Prediction) ******:
 release * number*.1 missing

Feature Extraction Demo¶

In [68]:

# Read All 5M data points
all_data_df = pd.read_csv('github_issues.csv')
# Extract the bodies from this dataframe
all_data_bodies = all_data_df['body'].tolist()

In [70]:

# transform all of the data using the ktext processor
all_data_vectorized = body_pp.transform_parallel(all_data_bodies)

In [71]:

# save transformed data
with open('all_data_vectorized.dpkl', 'wb') as f:
    dpickle.dump(all_data_vectorized, f)

In [262]:

%reload_ext autoreload
%autoreload 2
from seq2seq_utils import Seq2Seq_Inference
seq2seq_inf_rec = Seq2Seq_Inference(encoder_preprocessor=body_pp,
                                    decoder_preprocessor=title_pp,
                                    seq2seq_model=seq2seq_Model)
recsys_annoyobj = seq2seq_inf_rec.prepare_recommender(all_data_vectorized, all_data_df)

Example 1: Issues Installing Python Packages¶

In [223]:

seq2seq_inf_rec.demo_model_predictions(n=1, issue_df=testdf, threshold=1)


==============================================
============== Example # 13563 =================

"https://github.com/bnosac/pattern.nlp/issues/5"
Issue Body:
 thanks for your package, i can't wait to use it. unfortunately i have issues with the installation. prerequisite is 'first install python version 2.5+ not version 3 '. so this package cant be used with version 3.6 64bit that i have installed? i nevertheless tried to install it using pip, conda is not supported? but got an error: 'syntaxerror: missing parentheses in call to 'print''. besides when i try to run the library in r version 3.3.3. 64 bit i got errors with can_find_python_cmd required_modules = pattern.db : 'error in find_python_cmd......' pattern seems to be written in python but must be used in r, why cant it be used in python? i found another python pattern application that apparently does the same in python: https://pypi.python.org/pypi/pattern how is this related? 

Original Title:
 error installation python

****** Machine Generated Title (Prediction) ******:
 install with python * number *

**** Similar Issues (using encoder embedding) ****:

	issue_url	issue_title	body	dist
286906	"https://github.com/scikit-hep/root_numpy/issues/337"	root 6.10/02 and root_numpy compatibility	i am trying to pip install root_pandas and one of the dependency is root_numpy however some weird reasons i am unable to install it even though i can import root in python. i am working on python3.6 as i am more comfortable with it. is root_numpy is not yet compatible with the latest root?	0.694671
314005	"https://github.com/andim/noisyopt/issues/4"	joss review: installing dependencies via pip	hi, i'm trying to install noisyopt in a clean conda environment running python 3.5. running pip install noisyopt does not install the dependencies numpy, scipy . i see that you do include a requires keyword argument in your setup.py file, does this need to be install_requires ? as in https://packaging.python.org/requirements/ . also, not necessary if you don't want to, but i think it would be good to include a list of dependences somewhere in the readme.	0.698265
48120	"https://github.com/turi-code/SFrame/issues/389"	python 3.6 compatible	hi: i tried to install sframe using pip and conda but i can not find anything that will work with python 3.6? has sframe been updated to work with python 3.6 yet? thanks, drew	0.718715

Example 2: Issues asking for feature improvements¶

In [226]:

seq2seq_inf_rec.demo_model_predictions(n=1, issue_df=testdf, threshold=1)


==============================================
============== Example # 157322 =================

"https://github.com/Chingu-cohorts/devgaido/issues/89"
Issue Body:
 right now, your profile link is https://devgaido.com/profile. this is fine, but it would be really cool if there was a way to share your profile with other people. on my portfolio, i have social media buttons to freecodecamp, github, ect. without a custom link, i cannot show-off what i have done on devgaido to future employers. 

Original Title:
 feature request: sharable profile.

****** Machine Generated Title (Prediction) ******:
 add a link to your profile

**** Similar Issues (using encoder embedding) ****:

	issue_url	issue_title	body	dist
250423	"https://github.com/ParabolInc/action/issues/1379"	integrations list view discoverability	issue - enhancement i was initially confused by the link to my account copy; seeing github in the integrations list made me think it had already been set up . i realize now that i had to allow parabol to post as me. i think that link to my account could use a tooltip explaining what link means, and why you'd want to do so. <img width= 728 alt= screen shot 2017-09-29 at 10 52 05 am src= https://user-images.githubusercontent.com/2146312/31024786-2fd39c46-a50e-11e7-9f2a-6d4a5ed2baeb.png >	0.748828
222304	"https://github.com/viosey/hexo-theme-material/issues/166"	allow us to use sns-share for github	i'd love to be able to add a link at the bottom of the page for my github account. however, the sns-share option doesn't currently seem to be able to do this.	0.774398
153327	"https://github.com/tobykurien/GoogleApps/issues/31"	drive provide download ability	sometimes people share files via g drive. provided a link this app can show some info about the files but doesn't show the download button. i hope that it can be fixed and users would be able to download files with this app.	0.778953

In [78]:

# incase you need to reset the rec system
# seq2seq_inf_rec.set_recsys_annoyobj(recsys_annoyobj)
# seq2seq_inf_rec.set_recsys_data(all_data_df)

# save object
recsys_annoyobj.save('recsys_annoyobj.pkl')

Out[78]:

True

In [ ]:

Table of Contents

Process Data¶

Pre-Process Data For Deep Learning¶

Look at one example of processed issue bodies¶

Look at one example of processed issue titles¶

Define Model Architecture¶

Load the data from disk into variables¶

Define Model Architecture¶

Train Model¶

See Results On Holdout Set¶

Feature Extraction Demo¶

Example 1: Issues Installing Python Packages¶

Example 2: Issues asking for feature improvements¶